<<

INTERPOLATION AND MODEL REDUCTION

RALF ZIMMERMANN∗

Abstract. One approach to parametric and adaptive model reduction is via the interpolation of orthogonal bases, subspaces or positive definite system matrices. In all these cases, the sampled inputs stem from sets that feature a geometric structure and thus form so-called matrix . This work will be featured as a chapter in the upcoming Handbook on Model Order Reduction, (P. Benner, S. Grivet-Talocia, A. Quarteroni, G. Rozza, W. H. A. Schilders, L. M. Silveira, eds, to appear on DE GRUYTER) and reviews the numerical treatment of the most important matrix manifolds that arise in the context of model reduction. Moreover, the principal approaches to data interpolation and Taylor-like extrapolation on matrix manifolds are outlined and complemented by algorithms in pseudo-code.

Key words. parametric model reduction, matrix manifold, Riemannian computing, interpolation, interpolation on manifolds, Grassmann manifold, , matrix Lie

AMS subject classifications. 15-01, 15A16, 15B10, 15B48, 53-04, 65F60, 41-01, 41A05, 65F99, 93A15, 93C30

1. Introduction & Motivation. This work addresses interpolation approaches for parametric model reduction. This includes techniques for • computing trajectories of parameterized subspaces, • computing trajectories of parameterized reduced orthogonal bases, • structure-preserving interpolation. Mathematically, this requires data processing on nonlinear matrix manifolds. The exposition at hand intends to be an introduction and a reference guide to numerical procedures with matrix manifold-valued data. As such it addresses practitioners and scientists new to the field. It covers the essentials of those matrix manifolds that arise most frequently in practical problems in model reduction. The main purpose is not to discuss concrete model reduction applications, but rather to provide the essential tools, building blocks and background to enable the reader to devise her/his own approaches for such applications. The text was designed such that it works as a commented formula collection, meanwhile giving sufficient context, explanations and, not least, precise references to enable the interested reader to immerse further in the topic. 1.1. Parametric model reduction via manifold interpolation: An intro- ductory example. The basic objective in model reduction is to emulate a large-scale with very few such that its input/output behav- ior is preserved as well as possible. While classical model reduction techniques aim at producing an accurate low-order approximation to the autonomous behavior of the arXiv:1902.06502v2 [math.NA] 11 Sep 2019 original system, parametric model reduction (pMOR) tries to account for additional system parameters. If we look for instance at aircraft aerodynamics, an important task is to solve the unsteady Navier-Stokes at various flight conditions, which are, amongst others, specified by the altitude, the viscosity of the fluid (i.e. the Reynolds number) and the relative velocity (i.e. the Mach number).We explain the objective of pMOR with the aid of a generic example in the context of proper orthogo- nal decomposition-based model reduction. Similar considerations apply to frequency domain approaches, Krylov subspace methods and balanced truncation, which are

∗Department of Mathematics and Computer Science, University of Southern Denmark (SDU) Odense, ([email protected]). 1 discussed in other chapters of the upcoming Handbook on Model Order Reduction. Consider a spatio-temporal dynamical system in semi-discrete form ∂ x(t, µ) = f(x(t, µ); µ), x(t , µ) = x , (1.1) ∂t 0 0,µ

where x(t, µ) ∈ Rn is the spatially discretized state vector of n, the vec- d tor µ = (µ1, . . . , µd) ∈ R accounts for additional system parameters and f( · ; µ): Rn → Rn is the (possibly nonlinear, parameter-dependent) right hand side . -based MOR starts with constructing a suitable low-dimensional subspace that acts as a of candidate solutions. Subspace construction. One way to construct the required projection subspace is the proper orthogonal decomposition (POD), [48].In its simplest form, the POD 1 can be summarized as follows. For a fixed system parameter µ = µ0, let x := m n x(t1, µ0), ..., x := x(tm, µ0) ∈ R be a set of state vectors satisfying (1.1) and let 1 m n×m i S := x , ..., x ∈ R . The state vectors x are called snapshots and the matrix S is called the associated snapshot matrix. POD is concerned with finding a subspace n×r V of dimension r ≤ m represented by a column-orthogonal matrix Vr ∈ R such that the error between the input snapshots and their orthogonal projection onto V = ran(Vr) is minimized:   X k T k 2 T 2 min kx − VV x k2 ⇔ min kS − VV SkF . V ∈ n×r ,V T V =I V ∈ n×r ,V T V =I R k R The main result of POD is that for any r ≤ m, the best r-dimensional approximation of ran(x1, ..., xm) in the above sense is V = ran(v1, ..., vr), where {v1, ..., vr} are the eigenvectors of the matrix SST corresponding to the r largest eigenvalues. The sub- 1 r space V is called the POD subspace and the matrix Vr = (v , ..., v ) is the POD matrix. The same subspace is obtained via a compact singular value decomposition (SVD) of the snapshot matrix S = VΣZT , truncated to the first r ≤ m columns of n×m V ∈ R by setting V := ran(Vr). For more details, see, e.g. [17, §3.3]. In the following, we drop the index r and assume that V is already the truncated matrix V = (v1, ..., vr) ∈ Rn×r. Since the input snapshots are supplied at a fixed system parameter vector µ0, the POD subspace is considered to be an appropriate space of solution candidates V(µ0) = ran(V(µ0)) at µ0. Projection. POD leads to a parameter decoupling

x˜(t, µ0) = V(µ0)xr(t). (1.2) In this way, the time trajectory of the reduced model is uniquely defined by the coef- r ficient vector xr(t) ∈ R that represents the reduced state vector with respect to the subspace ran(V(µ0)). Given a matrix W(µ0) such that the matrix pair V(µ0), W(µ0) T is bi-orthogonal, i.e. W(µ0) V(µ0) = I, the original system (1.1) can be reduced in T dimension as follows. Substituting (1.2) in (1.1) and multiplying with W(µ0) from the left leads to d x (t) = T (µ )f( (µ )x (t); µ ), x (t ) = T (µ )x . (1.3) dt r W 0 V 0 r 0 r 0 V 0 0,µ0

This approach goes by the name of Petrov-Galerkin projection, if W(µ0) 6= V(µ0) and Galerkin projection if W(µ0) = V(µ0). There are various ways to proceed from (1.3) 2 depending on the nature of the function f and many of them are discussed in other chapters of the upcoming Handbook on Model Order Reduction. 1 For illustration purposes, we proceed with W(µ0) = V(µ0) and assume that the right hand side function f splits into a linear and a nonlinear part: f(x; µ0) = n×n A(µ0)x + f(x; µ0), where A(µ0) ∈ R is, say, a symmetric and negative definite matrix to foster stability. Then, (1.3) becomes

d x (t) = T (µ )A(µ ) (µ )x (t) + T (µ )f (µ )x (t); µ . dt r V 0 0 V 0 r V 0 V 0 r 0 In the discrete empirical interpolation method (DEIM, [27]), the large-scale nonlinear n×s term f V(µ0)xr(t); µ0) is approximated via a mask matrix P = (ei1 , . . . , eis ) ∈ R , j T n where {i1, . . . , is} ⊂ {1, . . . , n} and ej = (..., 1,...) ∈ R is the jth canonical unit vector. The mask matrix P acts as an entry selector on a given n-vector via T T s n×s P v = (vi1 , . . . , vis ) ∈ R . In addition, another POD basis matrix U(µ0) ∈ R is used, which is obtained from snapshots of the nonlinear term. The matrices P and U(µ0) are combined to form an oblique projection of the non-linear term onto the subspace ran(U(µ0)). This leads to the reduced model d x (t) = T (µ )A(µ ) (µ )x (t) dt r V 0 0 V 0 r T T −1 T  +V (µ0)U(µ0)(P U(µ0)) P f V(µ0)xr(t); µ0 , (1.4) whose computational is formally independent of the full-order dimension T n, see [27] for details. Mind that by assumption, M(µ0) := −V (µ0)A(µ0)V(µ0) is symmetric positive definite and that both V(µ0) and U(µ0) are column-orthogonal. Moreover, for a fixed mask matrix P , coordinate changes of V(µ0) and U(µ0) do not affect the approximated statex ˜(t, µ0) = V(µ0)xr(t), so that essentially, the reduced system (1.4) depends only on the subspaces ran(V(µ0)) and ran(U(µ0)) rather than 2 the matrices V(µ0) and U(µ0). Solving (1.3), (1.4) constitutes the online stage of model reduction. The main focus of this exposition is not on the efficient solution of the reduced systems (1.3) or (1.4) at a fixed µ0, but on tackling parametric variations in µ. In view of the associated computational costs, it is important that this can be achieved without computing additional snapshots in the online stage. A straightforward way to achieve this is to extend the snapshot sampling to the µ- parameter range to produce POD basis matrices that are to cover all input parameters. This is usually referred to as the “global approach”. For nonlinear systems, the global approach may suffer from requiring a large number of snapshot samples. Moreover, the snapshot information is blurred in the global POD and features that occur only in a restricted regime affect the ROM predictions everywhere. Therefore, localized approaches are preferable, see e.g. [35, 75, 77, 91, 100].

1 T If f( · ; µ0) is linear, the reduced operator W (µ0) ◦ f( · ; µ0) ◦ V(µ0) can be computed a priori (‘offline’) and stays fixed throughout the time integration. If f( · ; µ0) is affine, the same approach can be carried over to the affine building blocks of f( · ; µ0), see e.g. [42]. For a nonlinear f( · ; µ0), an affine approximation can be constructed via the emperical interpolation method (EIM, [14]). Other approaches that address nonlinearities include the discrete empirical interpolation method (DEIM, [27]) and the missing estimation (MPE, [13, 105]). 2 s×s Replacing U with US, S ∈ R orthogonal, does not affect (1.4) at all. Replacing V with VR, r×r R ∈ R orthogonal, induces a coordinate change on the reduced state xr = Rxˆr but preserves the outputx ˜(t) = Vxr(t) = VRxˆr(t). 3 In this contribution, the focus is on constructing trajectories of functions in the system parameters µ on certain sets of structured matrix . In the above exam- ple, these are the symmetric positive definite matrices {M ∈ Rr×r|M T = M, vT Mv > 0 ∀v 6= 0}, the matrices {U ∈ Rn×s|U T U = I} or the associated s-dimensional subspaces U := ran(U) ⊂ Rn:

T r×r T T µ 7→ −V (µ)A(µ)V(µ) ∈ {M ∈ R |M = M, v Mv > 0 ∀v 6= 0}, n×s T µ 7→ U(µ) ∈ {U ∈ R |U U = I}, n µ 7→ U(µ) = ran(U(µ)) ∈ {U ⊂ R | U subspace, dim(U) = s}. We outline generic methods for constructing such trajectories via interpolation. All the special sets of matrices considered above feature a differentiable structure that allows to consider them as of some Euclidean matrix space, referred to as matrix manifolds. The above example is not exhaustive. Other matrix manifolds may arise in model reduction applications. To keep the exposition both general and modular, the interpolation techniques will be formulated for arbitrary submanifolds. Model reduction literature on manifold interpolation problems includes [8, 9, 17, 31, 71, 73, 94, 76, 100, 29, 65]. 1.2. Structure and organization. The text is constructed modular rather than consecutive, so that selected reading is enabled. Yet, this entails that the reader will encounter some repetition. Section 2 covers the essential background from differential . Section 3 con- tains generic methods for interpolation and extrapolation on matrix manifolds. In Section 4, the geometric and numerical aspects of the matrix manifolds that arise most frequently in the context of model reduction are discussed. A practitioner that faces a problem in matrix manifold interpolation may skim through the recap on elementary differential geometry in Section 2 and then move on to the appropriate subsection of Section 4 that corresponds to the matrix manifold in the application. This provides the specific ingredients and formulas for conducting the generic interpolation methods of Section 3. 1.3. Notation & Abbreviations. • w.r.t.: with respect to • EVD: eigenvalue decomposition • SVD: singular value decomposition • POD: proper orthogonal decomposition • LTI: linear time- (system) • ODE: ordinary differential • PDE: partial differential equation • ONB: orthonormal basis n×r • R : the set of real n-by-r matrices

• In: the n-by-n identity matrix; if are clear, written as I n×r • ran(A): the subspace spanned by the columns of A ∈ R • GL(n): the general of real, invertible n-by-n matrices n×n T • sym(n) = {A ∈ R |A = A}: the set of real, symmetric n-by-n matrices n×n T • skew(n) = {A ∈ R |A = −A}: the set of real, skew-symmetric n-by-n matrices 4 T n • SPD(n) = {A ∈ sym(n)|x Ax > 0∀x ∈ R \{0}}: the set of real, symmetric positive definite n-by-n matrices n×n T T • O(n) = {Q ∈ R |Q Q = In = QQ }: the • SO(n) = {Q ∈ O(n)| det(Q) = 1}: the special orthogonal group n×r T • St(n, r) = {U ∈ R |U U = Ir}: the (compact) Stiefel manifold, r ≤ n n • Gr(n, r): the Grassmann manifold of r-dimensional subspaces of R , r ≤ n •M : a differentiable manifold

•D p ⊂ M: an open domain around the point p on a manifold M n n • Dx ⊂ R : an open domain in the around a point x ∈ R

• TpM: the space of M at a location p ∈ M T n×r •h A, Bi0 = trace(A B): the standard (Frobenius) inner product on R M •h v, wip : the Riemannian on TpM (the superscript is often omitted)

• expm: standard matrix exponential

• logm: standard (principal) matrix M • Expp : the Riemmanian exponential of a manifold M at base point p ∈ M M • Logp : the Riemmanian logarithm of a manifold M at base point p ∈ M 2. Basic concepts of differential geometry. This section provides the essen- tials on elementary differential geometry. Established textbook references on differ- ential geometry include [32, 57, 58, 60, 62]; condensed introductions can be found in [46, Appendices C.3, C.4, C.5] and [36]. An account of differential geometry that is tailor-made to matrix manifold applications is given in [3]. The fundamental objects of study in differential geometry are differentiable mani- folds. Differentiable manifolds are generalizations of (one-dimensional) and sur- faces (two-dimensional) to arbitrary dimensions. Loosely speaking, an n-dimensional differentiable manifold M is a that ‘locally looks like Rn’ with cer- tain properties. This concept is rendered precisely by postulating that n for every point p ∈ M, there exists a so-called coordinate chart x : M ⊃ Dp → R that bijectively an open neighborhood Dp ⊂ M of a location p to an open n n neighborhood Dx(p) ⊂ R around x(p) ∈ R with the important additional property that the coordinate change

−1 x ◦ x˜ :x ˜(Dp ∩ D˜p) → x(Dp ∩ D˜p) of two such charts x, x˜ is a diffeomorphism, where their domains of definition overlap, see [36, Fig. 18.2, p. 496] or [46, Fig. 3.1, p. 342]. Note that the coordinate change x ◦ x˜−1 maps from an open domain of Rn to an open domain of Rn, so that the standard concepts of multivariate apply. For details, see [3, §3.1.1] or [36, §18.8]. Depending on the context, we will write x(p) for the value of a coordinate chart at p and also x ∈ Rn for a point in Rn. Of special importance to numerical applications are embedded submanifolds in the Euclidean space. Definition 2.1 (Submanifolds of Rn+d). A parameterization is an bijective differentiable function f : Rn ⊃ D → f(D) ⊂ Rn+d with continuous inverse such that (n+d)×n its Jacobi matrix Dfx ∈ R has full rank n at every point x ∈ D. 5 A subset M ⊂ Rn+d is called an n-dimensional embedded of Rn+d, if n+d for every p ∈ M, there exists an open neighborhood Ω ⊂ R such that Dp := M∩Ω is the of a parameterization

n n+d f : R ⊃ Dx → f(Dx) = Dp = M ∩ Ω ⊂ R .

One can show that if f : D → M ∩ Ω and f˜ : D˜ → M ∩ Ω˜ are two parameterizations, ˜ say with f(x0) = f(˜x0) = p ∈ M ∩ Ω ∩ Ω,˜ then   f −1 ◦ f˜ : f˜−1(Ω ∩ Ω)˜ → f −1(Ω ∩ Ω)˜ is a diffeomorphism (between open sets in Rn). In this sense, parameterizations f are the inverses of coordinate charts x. In addition to coordinate charts and param- eterizations, submanifolds can be characterized via equality constraints. This fact is due to the theorem of classical multivariate calculus [61, §I.5]. For details, see [36, Thm. 18.7, p. 497]. Theorem 2.2 ([36, Prop. 18.7, p. 500]). Let h : Rn+d ⊃ Ω → Rd be differentiable d d×(n+d) and c0 ∈ R be defined such that the differential Dhp ∈ R has maximum possible rank d at every point p ∈ Ω with h(p) = c0. Then, the preimage

−1 h (c0) = {p ∈ Ω| h(p) = c0} is an n-dimensional submanifold of Rn+d. An obvious application of Theorem 2.2 3 2 2 2 to the function h : R → R, (x1, x2, x3) 7→ x1 + x2 + x3 − 1 establishes the unit S2 = h−1(0) as a 2-dimensional submanifold of R2+1. As a more sophisticated example, we recognize the orthogonal group as a differentiable (sub)-manifold: 2 Example 1. Consider the orthogonal group O(n) ⊂ Rn×n ' Rn and the set of symmetric matrices sym(n) ' Rn(n+1)/2. Define h : Rn×n → sym(n),A 7→ AT A − I. T T Then DhA(B) = A B + B A. For Q ∈ O(n), the differential is indeed surjective: 1 1 T 1 T T For any M ∈ sym(n), it holds DhQ( 2 QM) = 2 Q QM + 2 M Q Q = M. As a 2 1 consequence, the orthogonal group O(n) is a submanifold of dimension n − 2 (n(n + 1 n×n 1)) = 2 (n(n − 1)) of the Euclidean matrix space R . 2.1. Intrinsic and extrinsic coordinates.. As a rule, numerical data pro- cessing on manifolds requires calculations in explicit coordinates. For differentiable submanifolds, we distinguish between two types: extrinsic and intrinsic coordinates. Extrinsic coordinates address points on a submanifold M ⊆ Rn with respect to their coordinates in the Rn, while intrinsic coordinates are with respect to the local parameterizations. Hence, extrinsic coordinates are what an outside observer would see, while intrinsic coordinates correspond to the perspective of an observer that resides on the manifold. Let’s exemplify these concepts on the two-dimensional S2, embedded in R3. As a point set, the sphere is defined by the equation

2 T 3 2 2 2 S = {(x1, x2, x3) ∈ R | x1 + x2 + x3 = 1}.

T 2 Any three-vector (x1, x2, x3) ∈ S specifies a point on the sphere in extrinsic co- ordinates. However, it is intuitively clear that S2 is intrinsically a two-dimensional object. Indeed, S2 can be parameterized via

sin(α) cos(β) 2 2 2 3 f : R ⊃ [0, 2π) → S ⊂ R , (α, β) 7→ sin(α) sin(β) . cos(α) 6 The parameter vector (α, β) ∈ R2 specifies a point on S2 in intrinsic coordinates. Even though intrinsic coordinates directly reflect the dimension of the manifold at hand, they often cannot be calculated explicitly and extrinsic coordinates are the preferred choice in numerical applications [33, §2, p. 305]. Turning back to Example 1 1, we recall that the intrinsic dimension of the orthogonal group is 2 n(n − 1). Yet, in practice, one uses the extrinsic representation with (n × n)-matrices Q, keeping the defining equation QT Q = I in mind. 2.2. Tangent spaces.. We need a few more fundamental concepts. Definition 2.3 ( of a differentiable submanifold). Let M ⊂ Rn+d be an n-dimensional submanifold of Rn+d. The tangent space of M at a point p ∈ M, in symbols TpM, is the space of velocity vectors of differentiable curves c : t 7→ c(t) passing through p, i.e.,

TpM = {c˙(t0)| c : J → M, c(t0) = p}.

Here, J ⊆ R is an arbitrarily small open with t0 ∈ J. It is straightforward to

Fig. 2.1. Visualization of a manifold (curved ) with the tangent space TpM attached. The v =c ˙(0) ∈ TpM is the velocity vector of a c : t 7→ c(t) ∈ M.

show that the tangent space is actually a . Moreover, the tangent space can be characterized both with respect to intrinsic and extrinsic coordinates. Theorem 2.4 (Tangent space, intrinsic characterization). Let M ⊂ Rn+d be an n-dimensional submanifold of Rn+d and let f : Rn ⊇ D → f(D) ⊆ M be a parameterization. Then, for x ∈ D with p = f(x) ∈ M, it holds

TpM = ran(Dfx).

Theorem 2.5 (Tangent space, extrinsic characterization). Let h : Rn+d ⊃ Ω → d d −1 n+d R and c0 ∈ R be as in Theorem 2.2 and let M := h (c0) ⊂ R . Then, for 7 p ∈ M, it holds

TpM = ker(Dhp). Note that both Theorem 2.4 and Theorem 2.5 immediately show that the tangent space TpM is a vector space of the same dimension n as the manifold M. Example 2. The tangent space of the orthogonal group O(n) at a point Q0 is

n×n T T TQ0 O(n) = {∆ ∈ R | ∆ Q0 = −Q0 ∆}. This fact can be established via considering a matrix curve Q : t 7→ Q(t) with Q(0) = ˙ Q0 and velocity vector ∆ = Q(0) ∈ TQ0 O(n). Then, d d 0 = | I = | QT (t)Q(t) = ∆T Q + QT ∆. dt t=0 dt t=0 0 0 T T (The claim follows by counting the dimension of the subspace {∆ Q0 = −Q0 ∆}.) As an alternative, we can consider h : Rn×n → sym,A 7→ AT A − I as in Example 1. T T Then DhQ0 (∆) = Q0 ∆ + ∆ Q0 and TQ0 O(n) = ker(DhQ0 ). 2.3. and the Riemannian function. One of the most important problems in both general differential geometry and data processing on manifolds is to determine the shortest between two points on a given manifold. This requires to measure the of curves. Recall that the of a n R b curve c :[a, b] → R in the Euclidean space is L(c) = a kc˙(t)kdt. In order to transfer this to the manifold setting, an inner product for tangent vectors is needed that is consistent with the manifold structure. Definition 2.6 (Riemannian metrics). Let M be a differentiable submanifold n+d of R .A Riemannian metric on M is a family (h·, ·ip)p∈M of inner products h·, ·ip : TpM × TpM → R that is smooth in variations of the base point p. p 3 The length of a tangent vector v ∈ TpM is kvkp := hv, vip. The length of a curve c :[a, b] → M is defined as

Z b Z b q L(c) = kc˙(t)kc(t)dt = hc˙(t), c˙(t)ic(t)dt. a a

A curve is said to be parameterized by the , if L(c|[a,t]) = t − a for all t ∈ [a, b]. Obviously, unit-speed curves with kc˙(t)kc(t) ≡ 1 are parameterized by the arc length. Constant-speed curves with kc˙(t)kc(t) ≡ ν0 are parameterized proportional to the arc length. The Riemannian distance between two points p, q ∈ M with respect to a given metric is

distM(p, q) = inf{L(c)|c :[a, b] → M piecewise smooth, c(a) = p, c(b) = q}, (2.1) where, by convention, inf{∅} = ∞. Hence, a shortest path between p, q ∈ M is a curve c that connects p and q such that L(c) = distM(p, q). In general, shortest paths on M do not exist.4 Yet, candidates for shortest curves between points that

3 pp P p This notation should not be confused with the classical p-norm i |vi| . 4 2,∗ 2 Consider R = R \{(0, 0)} with the Euclidean inner product. There is no shortest connection 2,∗ 2,∗ from (−1, 0) to (1, 0) on R . A sequence of curves that is in R and converges to the curve 2 c :[−1, 1] → R , t 7→ (t, 0) is readily constructed. Hence, the Riemannian distance between (−1, 0) and (1, 0) is 2. Yet, every curve connecting these points must go around the origin. The length- 2,∗ minimizing curve of length 2 crosses the origin and is thus not an admissible curve on R . 8 are sufficiently close to each other can be obtained via a variational principle: Given a parametric family of suitably regular curves cs : t 7→ cs(t) ∈ M, s ∈ (−ε, ε) that connect the same fixed endpoints cs(a) = p and cs(b) = q for all s, one can consider the length s 7→ L(cs). A curve c = c0 is a first-order candidate for a shortest path between p and q, if it is a critical point of the length functional, i.e., d if ds |s=0L(cs) = 0. Such curves are called geodesics. Differentiating the length func- tional leads to the so-called first variation formula [62, §6], which, in turn, leads to the characterizing equation for geodesics: Definition 2.7 (Geodesics). A differentiable curve c :[a, b] → M is called a geodesic (w.r.t. to a given Riemannian metric), if the covariant of its velocity vector field vanishes, i.e.,

Dc˙ (t) = 0 ∀t ∈ [a, b]. (2.2) dt

Remark 1. If a starting point c(0) = p ∈ M and a starting velocity c˙(0) = v ∈ TpM are specified, then the geodesic equation (2.2) translates to an initial value problem of second order with guaranteed existence and uniqueness of local solutions, [3, p. 102]. An immediate consequence of (2.2) is that geodesics are constant- D speed curves. A formal introduction of the dt along a curve is beyond the scope of this contribution, and the interested reader is referred to, e.g., [62, §4, §5]. To get some intuition, we introduce this concept for embedded Riemannian submanifolds M ⊂ Rn+d, where the metric is the Euclidean metric of Rn+d restricted to the , see also [36, §20.12]: A vector field along a curve c :[a, b] → M is a differentiable v :[a, b] → Rn+d 5 n+d such that v(t) ∈ Tc(t)M. For every p ∈ M, the ambient R decomposes into an orthogonal direct sum

n+d ⊥ R = TpM ⊕ TpM ,

⊥ where TpM is the orthogonal of TpM and orthogonality is w.r.t. the n+d n+d standard Euclidean inner product on R . Let Πp : R → TpM be the (base point-dependent) orthogonal projection onto the tangent space at p. In this setting (and only in this), the covariant derivative of a vector field v(t) along a curve c(t) is Dv the tangent component ofv ˙(t), i.e., dt (t) = Πc(t)(v ˙(t)). As a consequence,

Dc˙ (t) = Π (¨c(t)) (2.3) dt c(t)

and the geodesics on Riemannian submanifolds with the metric induced by the ambi- ent Euclidean inner product are precisely the constant-speed curves with acceleration ⊥ vectors orthogonal to the corresponding tangent spaces, i.e.,c ¨(t) ∈ Tc(t)M . Example: On the unit sphere S2 ⊂ R3, the geodesics are great . When con- sidered as curves in the ambient R3, their acceleration vector points directly to the origin and is thus orthogonal to the corresponding tangent space. When viewed as entities of S2, these curves do not experience any acceleration at all.

5The prime example for such a vector field is the curve’s own velocity field v(t) =c ˙(t). 9 c˙(t) c¨(t)

Mind that a constant-speed curve in Rn changes its direction only, when it experiences a non-zero acceleration. In this sense, geodesics on manifolds are the counterparts to straight lines in the Euclidean space. In general, a covariant derivative, also known as a linear connection, is a bilinear mapping (X,Y ) 7→ ∇X Y that maps two vector fields X,Y to a third vector field ∇X Y in such a way that it can be interpreted as the of Y in the direction of X. Of importance is the Riemannian connection or Levi-Civita connection that is compatible with a Riemannian metric [3, Thm 5.3.1], [62, Thm 5.4]. It is determined uniquely by the Koszul formula

2h∇X Y,Zi = X(hY,Zi) + Y (hZ,Xi) − Z(hX,Y i) −hX, [Y,Z]i − hY, [X,Z]i + hZ, [X,Y ]i and is used to define the Riemannian

6 (X,Y,Z) 7→ R(X,Y )Z = ∇X ∇Y Z − ∇Y ∇X Z − ∇[X,Y ]Z.

A is flat if and only if it is locally isometric to the Euclidean space, which holds if and only if the Riemannian curvature tensor vanishes identically [62, Thm. 7.3]. Hence, ‘flatness’ depends on the Riemannian metric. 2.4. coordinates.. The local uniqueness and existence of geodesics allows us to map a tangent vector v ∈ TpM to the endpoint of a geodesic that starts from p ∈ M with velocity v. Formalizing this principle gives rise to the Riemannian exponential

M M Expp : TpM ⊃ Bε(0) → M, v 7→ q := Expp (v) := cp,v(1). (2.4)

Here, t 7→ cp,v(t) is the geodesic that starts from p with velocity v and Bε(0) ⊂ TpM is the open with radius ε and center 0 in the tangent space7, see Fig. 2.2. Note that we can restrict the considerations to unit-speed geodesics via

 v  ExpM(v) := c (1) = c (t ) = ExpM t , p p,v p,v/kvk v p v kvk

where tv = kvk, see [62, §5., p. 72 ff.] for the details. For ε > 0 small enough, the Riemannian exponential is a smooth diffeomorphism between Bε(0) and an open domain on Dp ⊂ M around the point p. Hence, it is invertible. The smooth inverse map is called the Riemannian logarithm and is denoted by

M M −1 Logp : M ⊃ Dp → Bε(0) ⊂ TpM, q 7→ v := (Expp ) (q), (2.5)

6In these formulae, [X,Y ] = X(Y ) − Y (X) is the Lie bracket of two vector fields. 7 For technical reasons, ε > 0 must be chosen small enough such that cp,v(t) is defined on the [0, 1]. 10 Fig. 2.2. The Riemannian exponential sends tangent vectors to point of geodesic curves.

where v satisfies cp,v(1) = q. Thus, the Riemannian logarithm is associated with the geodesic endpoint problem: Given p, q ∈ M, find a geodesic that connects p and q. The Riemannian exponential map establishes a local parametrization of a small region around a location p ∈ M in terms of coordinates of the flat vector space TpM. This is referred to as representing the manifold in normal coordinates [57, §III.8], [62, Lem. 5.10]. Normal coordinates are radially isometric in the sense that the Riemannian distance between p and q = M Expp (v) is exactly the same as the length of the tangent vector kvkp as measured in the metric on TpM, provided that v is contained in a neighborhood of 0 ∈ TpM, where the exponential is invertible, [62, Lem. 5.10 & Cor. 6.11]. Mind that the definition of the Riemannian exponential depends on the geodesics, which, in turn, depend on the chosen Riemannian metric – via Definition 2.6. Different metrics lead to different geodesics and thus to different exponential and logarithm maps. 2.5. Matrix Lie groups and quotients by group actions. In general, a is a differentiable manifold G which also has a group structure, such that the group operations ‘multiplication’ and ‘inversion’,

G × G 3 (g, g˜) 7→ g · g˜ ∈ G and G 3 g 7→ g−1 ∈ G are both smooth [36, 43, 38]. A matrix Lie group G is a subgroup of GL(n, C) that is closed in GL(n, C).8 This definition already implies that G is an embedded sub- manifold of Cn×n [43, Corollary 3.45]. Not all matrix groups are Lie groups and not all Lie groups are matrix Lie groups, see [43, §1.1 and §4.8]. However, matrix Lie groups are arguably the most important class of Lie groups when it comes to practical applications and this exposition is restricted to this subclass. Let G be an arbitrary matrix Lie group. When endowed with the bracket operator or matrix commutator [V,W ] = VW − WV , the tangent space TI G at the identity

8 n×n but not necessarily in C . 11 is called the associated with the Lie group G, see [43, §3]. As such, it is denoted by g = TI G. For any A ∈ G, the function “left-multiplication with A” is a diffeomorphism LA : G → G,LA(B) = AB; its differential at a point M ∈ G is the isomporphism d(LA)M : TM G → TLA(M)G, d(LA)M (V ) = AV . Using this observation at M = I shows that the tangent space at an arbitrary location A ∈ G is given by the translates (by left-multiplication) of the tangent space at the identity:

 n×n TAG = TLA(I)G = Ag = ∆ = AV ∈ R | V ∈ g , (2.6)

[38, §5.6, p. 160]. The Lie algebra g = TI G of G can equivalently be characterized as the set of all matrices ∆ such that expm(t∆) ∈ G for all t ∈ R. The intuition behind this fact is that all tangent vectors are velocity vectors of smooth curves running on G (Definition 2.3) and that c(t) = expm(t∆) is a smooth curve starting from c(0) = I with velocityc ˙(0) = ∆, see [43, Def. 3.18 & Cor. 3.46] for the details. By definition, the exponential map9 for a matrix Lie group is the matrix exponential restricted to the corresponding Lie algebra, i.e. the tangent space at the identity g = TI G, [43, §3.7],

expm |g : g → G. In general, a Lie algebra is a vector space with a linear, skew-symmetric bracket operation, called Lie bracket [·, ·] that satisfies the Jacobi identity.

[X, [Y,Z]] + [Z, [X,Y ]] + [Y, [Z,X]] = 0.

Quotients of Lie groups by closed subgroups. In many settings, it is important or sometimes even necessary to consider certain points p, q on a given differentiable manifold M as equivalent. Consider the following example. n×r T Example 3. Let U ∈ R feature orthonormal columns so that U U = Ir. We may extend the columns of U = (u1, . . . , ur) to an orthogonal matrix Q = I 0   (u1, . . . , ur, ur+1, . . . , un) ∈ O(n). Let I × O(n − r) := r | R ∈ O(n − r) . r 0 R This is actually a closed subgroup of O(n), in symbols (Ir × O(n − r)) ≤ O(n). The action Q˜ = QΦ with any orthogonal matrix Φ ∈ Ir × O(n − r) preserves the first r columns of Q. Hence, we may identify U with the [Q] = {QΦ|Φ ∈ Ir × O(n − r)} ⊂ O(n). In Sections 4.4 and 4.5, we will see that this example estab- lishes the Stiefel manifold of ONBs and eventually also the Grassmann manifold of subspaces as quotients of the orthogonal group O(n). Note that in the example, the equivalence relation is induced by actions of the Lie group Ir × O(n − r). Quotients that arise from such group actions are important examples of quotient manifolds. The following Theorems 2.9 and 2.11 cover this example as well as all other cases of quotient manifolds that are featured in this work. First, group actions need to be formalized. Definition 2.8. (cf. [63, p. 162,163]) Let G be a Lie group, M be a smooth manifold, and let G × M → M, (g, p) 7→ g · p be a left action of G on M.10 The orbit relation on M induced by G is defined by

p ' q :⇔ ∃g ∈ G : g · p = q.

9The exponential map of a Lie group must not be confused with the Riemannian exponential. 10The theory for right actions is analogous. In all cases considered in this work, M is a matrix manifold so that “·” is the usual matrix product. 12 The equivalence classes are the G-orbits [p] := Gp := {g · p| g ∈ G}. The orbit space is denoted by M/G := {[p]| p ∈ M}. The quotient map sends a point to its G- orbit via Π: M → M/G, p 7→ [p]. The action is free, if every group Gp := {g ∈ G| g · p = p} is trivial, Gp = {e}. Theorem 2.9. (Quotient Manifold Theorem, cf. [63, Thm. 21.10]) Suppose G is a Lie group acting smoothly, freely, and properly on a smooth manifold M. Then the orbit space M/G is a manifold of dimension dim M − dim G, and has a unique such that the quotient map Π: M → M/G, p 7→ [p] is a smooth .11 In this context, M is called the total space and M/G is the quotient (space). A special case is Lie groups under actions of Lie subgroups. Definition 2.10. [63, §21, p. 551] Let G be a Lie group and H ≤ G be a Lie subgroup. For g ∈ G, a subset of G of the form [g] := gH = {g · h| h ∈ H} is called a left coset of H. The left cosets form a partition of G, and the quotient space determined by this partition is called the left coset space of G modulo H, and is denoted by G/H. Coset spaces of Lie groups are again smooth manifolds: Theorem 2.11. (cf. [63, Thm 21.17, p. 551]) Let G be a Lie group and let H be a closed subgroup of G. The left coset space G/H is a manifold of dimension dim G − dim H with a unique differentiable structure such that the quotient map Π: G → G/H, g 7→ [g] is a smooth submersion. In general, if π : M → N is a surjective submersion between two manifolds M and N , then for any q ∈ N , the −1 the preimage π (q) ⊂ M is called the fiber over q, and is denoted by Mq. Each fiber Mq is itself a closed, embedded submanifold by the theorem. M If M has a Riemannian metric h·, ·ip , then at each point p ∈ M, the tangent space ⊥ TpM decomposes into an orthogonal direct sum TpM = TpMπ(p) ⊕ (TpMπ(p)) . The tangent space of the fiber TpMπ(p) =: Vp is the called the vertical space, its ⊥ orthogonal complement Hp := Vp is the horizontal space. The vertical space is the kernel Vp = ker(dπp) of the differential dπp : TpM → Tπ(p)N ; the horizontal space is ∼ isomorphic to Tπ(p)N . This allows to identify Hp = Tπ(p)N , see [3, Fig. 3.8., p. 44] for an illustration. This construction helps to compute tangent spaces of quotients, if the tangent space of the total space is known. If G/H is a quotient as in Theorem 2.9 or 2.11 and if Π : G → G/H is the corresponding quotient map, then Π is a local diffeomorphism. A Riemannian metric on the quotient can be defined by

G/H −1 −1 G hv, wi[g] := h(dΠg) (v), (dΠg) (w)ig , v, w ∈ T[g](G/H). (2.7)

For this (and only this) metric, the quotient map is a local . In fact, Theorem 2.11 additionally establishes G/H as a , i.e. a smooth manifold M endowed with a transitive smooth action by a Lie group (cf. [63, §21, p. 550]). In the setting of the theorem, the is given by the left action of G on G/H given by g1 · [] := [g1 · g2]. A transitive action allows us to transport a location p ∈ M to any other location q ∈ M. 3. Interpolation on non-flat manifolds. When working with matrix mani- folds, the data is usually given in extrinsic coordinates, see Section 2. For example, n×r T data on the compact Stiefel manifold St(n, r) = {U ∈ R |U U = Ir}, r ≤ n, is given in form of n-by-r matrices. These matrices feature nr entries while the in- trinsic number of degrees of freedom, i.e., the intrinsic dimension is turns out to be

11i.e. a smooth surjective mapping such that the differential is surjective at every point. 13 1 nr − 2 r(r + 1), see Section 4.4. Essentially, the practical obstacle associated with data interpolation on matrix manifolds arises from this fact. Given, say, k matrices on St(n, r) in extrinsic coordinates, interpolating entry-by-entry will most certainly lead to interpolants that do not feature orthogonal columns and thus are not points on the Stiefel manifold. Likewise, entry-by-entry interpolation of positive definite matrices is not guaranteed to produce another positive definite matrix. There are essentially two different approaches to address this issue: Performing the interpolation on the tangent space of the manifold and using the Riemannian barycenter or Riemannian center of mass as an interpolant. Both will be explained in more detail in the next two subsections.12 3.1. Interpolation in normal coordinates. As outlined in Section 2, every location p ∈ M on an n-dimensional differentiable manifold features a small neigh- n borhood Dp that is the domain of a coordinate chart x : M ⊃ Dp → Dx(p) ⊂ R n that maps bijectively onto an Dx(p) ⊂ R . Therefore, for a sample data set {p1, . . . , pk} ⊂ Dp that is completely contained in the domain of a single coordinate chart x, interpolation can be performed as follows: 1. Map the data set to Dx(p): Calculate v1 = x(p1), . . . , vk = x(pk) ∈ Dx(p). ∗ 2. Interpolate in Dx(p) to produce the interpolant v ∈ Dx(p). ∗ −1 ∗ 3. Map back to manifold: compute p = x (v ) ∈ Dp. In principle, any coordinate chart may be applied. In practice, the challenge is to find a suitable coordinate chart that can be evaluated efficiently. Moreover, it is desirable that the chosen chart preserves the geometry of the original data set as well as possible.13 The standard choice is to use normal coordinates as introduced in Section 2.4. This means that the Riemannian logarithm is used as the coordinate chart

M Logp : M ⊃ Dp → Bε(0) ⊂ TpM with the Riemannian exponential

M Expp : TpM ⊃ Bε(0) → Dp ⊂ M as the corresponding parameterization. The general procedure of data interpolation via the tangent space is formulated as Algorithm 1.

Algorithm 1 Interpolation in normal coordinates.

Input: Data set {p1, . . . , pk} ⊂ M. 1: Choose pi ∈ {p1, . . . , pk} as a base point. M 2: Check that Log (p ) is well-defined for all j = 1, . . . , k. pi j 3: for j = 1, . . . , k do M 4: Compute v := Log (p ) ∈ T M. j pi j p 5: end for ∗ 6: Compute v via Euclidean interpolation of {v1, . . . , vk}. M 7: Compute p∗ := Exp (v∗) pi Output: p∗ ∈ M.

Remark 2. There are a few facts that the practitioner needs to be aware of:

12German speaking readers may find an introduction that addresses a general scientific audience in [89]. 13There are no isometric coordinate charts on a non-flat manifold, see [62, Thm 7.3]. 14 1. The interpolation procedure of Algorithm 1 depends on which sample point is selected to act as the base point. Different choices may lead to different interpolants.14 2. For matrix manifolds, the tangent space is often also given in extrinsic coor- dinates. This means that an entry-by-entry interpolation of the matrices that represent the tangent vectors may lead to an interpolant that is not in the tan- gent space. As an illustrative example, consider the Gr(n, r). T Matrices ∆1,..., ∆k ∈ T[U]Gr(n, r) are characterized by U ∆j = 0. Entry- by-entry interpolation in the tangent space may potentially result in a matrix ∆∗ that is not orthogonal to the base point U, i.e. U T ∆∗ 6= 0, see [100, §2.4]. In general, because of the vector space structure of the tangent space of any manifold M, it is sufficient to use an interpolation method that expresses the interpolant in TpM as a weighted linear combination of the sampled tangent vectors v1, . . . , vk ∈ TpM

k ∗ X v = ωjvj. j=1

Amongst others, linear interpolation, Lagrange and Hermite interpolation, spline interpolation and interpolation via radial basis functions fulfill this requirement. As an aside, the interpolation procedure is computationally less expensive, since it works on the weight coefficients ωj rather than on every single entry. Quasi-linear interpolation of trajectories via geodesics. In this paragraph, we ad- dress applications, where the sampled manifold data features a univariate parametric dependency. The setting is as follows. Let M be a Riemannian manifold and suppose that there is a trajectory

c :[a, b] → M, µ 7→ c(µ)

on M that is sampled at k instants µ1, . . . , µk ∈ [a, b]. Then, an interpolantc ˆ for c can be computed via Algorithm 2. The interpolants at µ ∈ [µj, µj+1] that are output

Algorithm 2 Geodesic interpolation

Input: Data set {c(µ1), . . . , c(µk)} ⊂ M sampled from a curve c : µ → c(µ), unsam- ∗ pled instant µ ∈ [µj, µj+1]. M 1: Compute v := Log (c(µ )) ∈ T M. j+1 c(µj ) j+1 c(µj ) ∗ ∗ M  µ −µj  2: Computec ˆ(µ ) := Exp vj+1 c(µj ) µj+1−µj Output: cˆ(µ∗) ∈ M interpolant of c(µ∗).

by Algorithm 2 lie on the unique geodesic connection between the points c(µj) and c(µj+1). Hence, it is the straightforward manifold analogue of linear interpolation and is base-point independent. The generic formulation of Algorithm 1 allows to employ higher-order interpola- tion methods. However, this does not necessarily lead to more accurate results: the overall error depends not only on the interpolation error within the tangent space but also on the distortion caused by mapping the data to a selected (fixed) tangent space, 15 Fig. 3.1. Illustration of the course of action of Algorithms 1 and 2. Algorithm 1 (right) first maps all data points to a selected fixed tangent space. In Algorithm 2 (left), two points pj = c(µj ) and pj+1 = c(µj+1) are connected by a geodesic , then the base is shifted to point pj+1 and the procedure is repeated. see Fig. 3.1. Algorithms 1 and 2 can be applied in practical applications, where the Riemannian exponential and logarithm mappings are known in explicit form. Applications in parametric model reduction that consider matrix manifolds include [31] (GL(n)-data), [8, 73, 100] (Grassmann-data), [104] (Stiefel data) and [9, 81] (SPD(n)-data). 3.2. Interpolation via the Riemannian center of mass. As pointed out in Remark 2, interpolation of manifold data via the back and forth mapping of a complete data set of sample points between the manifold and its tangent space depends on the chosen base point. As a consequence, sample points may experience an uneven distortion under the projection onto the tangent space, see Fig. 3.1 (right). An approach that avoids this issue is to interpret interpolation as the task of finding suitably weighted Riemannian centers of mass. This concept was introduced in the context of geodesic finite elements in [90, 41]. The idea is as follows: The Riemannian center of mass15 or Fr´echet mean of a sample data set {p1, . . . , pk} ∈ M on a manifold with respect to the weights Pk wi ≤ 0, i=0 wi = 1 is defined as the minimizer(s) of the Riemannian objective function k 1 X M 3 q 7→ f(q) = w dist(q, p )2, 2 i i i=1 where dist(q, pi) is the Riemannian distance of (2.1). This definition generalizes the notion of the barycentric mean in Euclidean spaces. However, on curved manifolds, the global center might not be unique. Moreover, local minimizers may appear. For more details, see [55] and [4], which also give uniqueness criteria. Interpolation is now performed by computing weighted Riemannian centers. More d precisely, let µ1, . . . , µk ⊂ R be sampled parameter locations and let pi = p(µi) ∈ M, i = 1, . . . , k be the corresponding sample locations on M. Interpolation is within the d convex hull conv{µ1, . . . , µk} ⊂ R of the samples. Let {ϕi : µ 7→ ϕi(µ)|i = 1, . . . , k} be a suitable set of interpolation functions with ϕi(µj) = δij, say Lagrangians [90], splines [41] or radial basis functions [23].

14In the practical applications considered in [8], it was observed that the base point selection has only a minor impact on the final result. 15Here, we introduce this for discrete data sets; for centers w.r.t. a general mass distribution, see the original paper [55], Section 1. 16 Then, the interpolant p∗ ≈ p(µ∗) ∈ M at an unsampled parameter location µ∗ ∈ conv{µ1, . . . , µk} is defined as the minimizer of

k 1 X p∗ = arg min f(q) = ϕ (µ∗) dist(q, p )2. (3.1) 2 i i q∈M i=1

At a sample location µj, one has indeed that

k k X 2 X 2 2 ϕi(µj) dist(q, pi) = δij dist(q, pi) = dist(q, pj) , i=1 i=1

which has the unique global minimum at q = pj. Computing p∗ requires to solve a Riemannian optimization problem. The simplest approach is a descent method [4, 3]. The gradient of the objective function f in (3.1) is

k X ∗ M ∇fq = − ϕi(µ ) Logq (pi) ∈ TqM. (3.2) i=1 see [55, Thm 1.2], [4, §2.1.5], [90, eq. (2.4)]. Hence, just like interpolation in the tangent space, the interpolation via the Riemannian center can be pursued only in applications, where the Riemannian logarithm can be computed. A generic gradient descent algorithm to compute the barycentric interpolant for a function p : Rd 3 µ 7→ p(µ) ∈ M reads as follows. An implementation of this (type of) method for finding

Algorithm 3 Interpolation via the weighted Riemannian center [83, 4].

Input: Sample data set {p1 = p(µ1), . . . , pk = p(µk)} ⊂ M, unsampled parameter ∗ d location µ ∈ conv(µ1, . . . , µk) ⊂ R , initial guess q0, convergence threshold τ. 1: k := 0

2: Compute ∇fqk according to (3.2)

3: while k∇fqk kq > τ do 4: select a step size αk M 5: q := Exp (−α ∇f ) k+1 qk k qk 6: k := k + 1 7: end while ∗ ∗ Output: p := qk ∈ M interpolant of p(µ ).

the Karcher mean in SO(3) is discussed in [83]. Of course, Riemannian analogues to more sophisticated nonlinear optimization methods may also be employed, see [3]. In the context of model reduction, the benefits of interpolation via weighted Rie- mannian centers and the computational costs of solving the associated Riemannian optimization problem must be juxtaposed. 3.3. Additional approaches. A large variety of sophistications and further manifold interpolation techniques exists in the literature: The acceleration-minimi- zing property of cubic splines in the Euclidean space can be generalized to Riemannian manifolds in form of a variational problem [74, 30, 24, 93, 21, 87, 54], see also [80] and references therein. Moreover, the construction concepts of B´eziercurves and the De Casteljau-algorithm [15] can be transferred to Riemannian manifolds [80, 59, 72, 1, 88]. 17 B´eziercurves in Euclidean spaces are polynomial splines that rely on a number of so- called control points. To obtain the value of a B´eziercurve at time t, a recursive sequence of straight-line convex combinations between pairs of control points must be computed. The transition of this technique to Riemannian manifolds is via replacing the inherent straight lines with geodesics [80]. Another option is to conduct the B´ezier/DeCasteljau-algorithm in the tangent space and to transfer the results to the manifold via a geodesic averaging of the spline arcs that were constructed in the tangent spaces at the first and the last control point, respectively, see [40]. Derivative information may also be incorporated in interpolation schemes on Rie- mannian manifolds. A Hermite-type method that is specifically tailored for interpola- tion problems on the Grassmann manifold is sketched in [7, §3.7.4]. General Hermitian manifold interpolation in compact, connected Lie groups with a bi-invariant metric has been considered in [52]. A practical approach to conduct first-order Hermite interpolation of data on arbitrary Riemannian manifolds is discussed in [103]. 3.4. Quasi-linear extrapolation on matrix manifolds. In application sce- narios, where both snapshot data of the full-order model and derivative information are at hand, various approaches have been suggested to exploit the latter. On the one hand, can be used for improving the ROMs accuracy and approximation quality by constructing POD bases that incorporate snapshots and snapshot deriva- tives [25, 48, 51, 99]. On the other hand, snapshot derivatives enable to parameterize the ROM bases and subspaces or to perform sensitivity analyses [97, 45, 44, 101]. In this section, we outline an approach to transfer the idea of extrapolation and param- eterization via local linearizations to manifold-valued functions. The underlying idea is comparable to the trajectory piece-wise linear (TPWL) method [84]. Yet, TPWL linearizes the full-order model prior to the ROM projection, whereas here, we consider linearizing ROM building blocks like the reduced orthogonal bases, reduced subspaces or reduced system matrices. A geometric first-order Taylor approximation. Any differentiable function f : Rn → Rn can be linearized via a first-order Taylor expansion. A step ahead of size t n 2 in direction d ∈ R gives f(x0 + td) = f(x0) + tDfx0 (d) + O(t ). When considering t 7→ c(t) := f(x0 + td) as a curve, then the first-order Taylor approximant is the straight line g : t 7→ c(0) +c ˙(0)t. Such first order linearization often serves for extrapolating a given nonlinear function in a neighborhood of a selected expansion point. For doing so, the starting point c(0) and the starting velocityc ˙(0) must be available. This procedure translates to the manifold setting, when straight lines are replaced with geodesics. Let µ ∈ R be a scalar parameter and let c : µ 7→ c(µ) ∈ M be a curve on a

submanifold M. For given initial values c(µ0) = p0 ∈ M andc ˙(µ0) = v0 ∈ Tp0 M,

the corresponding unique geodesic cp0,v0 is expressed via the Riemannian exponential as

c : µ → M, µ 7→ ExpM(µv ). p0,v0 p0 0

Example: Extrapolating POD basis matrices. As outlined in Section 1.1, snap- 1 m shot POD works by collecting state vector snapshots, x := x(t1, µ0), ..., x := n 1 m x(tm, µ0)} ∈ R followed by an SVD of the snapshot matrix x , ..., x (µ0) =: T n×m S(µ0) = U(µ0)Σ(µ0)Z (µ0). Here, the matrix dimensions are U(µ0) ∈ R , Σ(µ0) ∈ m×m m×m R , Z(µ0) ∈ R . The objective is to approximate U(µ0 + µ) for a small µ > 0 18 Algorithm 4 Geodesic extrapolation.

Input: Scalar parameter µ0 ∈ R, initial values c(µ0) ∈ M, c˙(µ0) ∈ Tc(µ0)M sampled from a curve c : µ → c(µ) ∈ M, parameter value µ∗ > 0. M 1: Computec ˆ(µ + µ∗) := Exp (µ∗c˙(µ )) 0 c(µ0) 0 ∗ ∗ Output: cˆ(µ0 + µ ) ∈ M extrapolant of c(µ0 + µ ).

˙ based on the data U(µ0), U(µ0), where U(µ0) is a point on the Stiefel manifold St(n, m) ˙ and U(µ0) is a tangent vector, see Section 4.4.1. Differentiating the SVD. If the snapshot matrix function µ 7→ S(µ) ∈ Rn×m is smooth in the neighborhood of µ0 ∈ R and if the singular values of S(µ0) are mutually distinct16, then the singular values and both the left and the right singular vectors are differentiable in µ ∈ [µ0 − δµ, µ0 + δµ] for δµ small enough. For brevity, let ˙ dS S = dµ (µ0) denote the derivative with respect to µ evaluated in µ0 and so forth. Let µ 7→ S(µ) = U(µ)Σ(µ)Z(µ)T ∈ Rn×m and let C(µ) = (ST S)(µ). Let uj and vj, j = 1, . . . , m denote the columns of U(µ0) and Z(µ0), respectively. It holds j T ˙ j σ˙ j = (u ) Sv , (j = 1, . . . , m), (3.3) ( j T i i T j σj (u ) ˙v +σi(u ) ˙v ˙ S S , i 6= j Z = ZA, where Aij = (σj +σi)(σj −σi) (i, j = 1, . . . , m), (3.4) 0, i = j

−1 −1 −1   −1 U˙ = SZ˙ Σ + SZ˙ Σ + SZΣ˙ = SZ˙ + U(ΣA − Σ)˙ Σ . (3.5)

T ˙ A proof can be found in [45]. Note that U (µ0)U(µ0) is skew-symmetric so that indeed ˙ (µ ) =: ∆(µ ) ∈ T St(n, m). The above equations hold in approximative U 0 0 U(µ0) form for the truncated SVD. For convenience, assume that U(µ0) ∈ St(n, r) is now the truncated to r ≤ m columns. ˙ Performing the Taylor extrapolation on St(n, r). With U(µ0), U(µ0) at hand, ˆ U(µ0 +µ) can be approximated using the Stiefel exponential: U(µ0 +µ) ≈ U(µ0 +µ) := St ˙ Exp (µ (µ0)), see Algorithm 7.The process is illustrated in Fig. 3.2. U0 U Note that when the µ-dependency is real-analytic, then the Euclidean Taylor expansion

µ2 (µ + µ) = (µ ) + µ ˙ (µ ) + ¨(µ ) + O(µ3) ∈ St(n, r) (3.6) U 0 U 0 U 0 2 U 0 converges to an orthogonal matrix U(µ0 + µ) ∈ St(n, r). Yet, when truncating the Taylor series, we leave the Stiefel manifold. In particular, the columns of the first ˙ order approximation are not orthonormal, i.e. U(µ0) + µU(µ0) ∈/ St(n, r) for µ 6= 0. ˙ By construction, the Stiefel geodesic features the same starting velocity U(µ0) and thus matches the Taylor series up to terms of second order. In addition, it respects the geometric structure of the Stiefel manifold and thus preserves column-orthonormality for every µ. 4. Matrix manifolds of practical importance. In this section, we discuss the matrix manifolds that feature most often in practical applications in the context of model reduction. For each manifold under consideration, we recap, if applicable • the representation of points/locations in numerical schemes.

16This condition can be relaxed, see the results of [5, §7]. 19 Fig. 3.2. Extrapolation of matrix manifold data. Sketched on the right is the sample matrix n×r data in R . The curved line on the left represents the nonlinear matrix manifold; the straight lines represent the tangent vectors in the tangent space. The matrix curve is linearized at U(q0), U(q1), etc.

• the representation of tangent vectors in numerical schemes. • the most common Riemannian metrics. • how to compute , geodesics and the Riemannian exponential and logarithm mappings. 4.1. The . This section is devoted to the general linear group GL(n) of invertible matrices. In model reduction, regular matrices ap- pear for example as (reduced) system matrices in LTI and discretized PDE systems [9, 31, 76] and parameterizations have to be such that matrix regularity is preserved. In addition, the discussion of the seemingly simple matrix manifold GL(n) is impor- tant, because it is the fundamental matrix Lie Group from which all other matrix Lie groups are derived. Moreover, it provides the background for understanding quo- tient spaces of GL(n), see Subsection 2.5 and also [20, 96]. A short summary on the of GL(n) is given in [82, §6]. 4.1.1. Introduction and data representation in numerical schemes. Be- −1 cause GL(n) = det (R \{0}) = {A ∈ Rn×n| det(A) 6= 0}, GL(n) is an open subset 2 of the n2-dimensional vector space Rn×n ' Rn and is thus an n2-dimensional dif- ferentiable manifold, see [63, Examples 1.22–1.27]. The matrix manifold GL(n) is disconnected as it decomposes into two connected components, namely the regular matrices of positive determinant and the regular matrices of negative determinant. Because GL(n) is an open subset of the vector space Rn×n, the tangent space n×n at a location A ∈ GL(n) is simply TAGL(n) = R . For GL(n), the Lie algebra is gl(n) = Rn×n, so that the Lie group exponential is the standard matrix exponential n×n expm : R = gl(n) → GL(n). From the Lie group perspective (2.6), the tangent space at an arbitrary point A ∈ GL(n) is to be considered as the set TAGL(n) = Agl(n) = A(Rn×n), even though this set coincides with Rn×n. 4.1.2. Distances and geodesics. The obvious choice for a Riemannian metric on GL(n) is to use the inner product from the ambient Euclidean matrix space, i.e.,

T h∆, ∆˜ iA = h∆, ∆˜ i0 = trace(∆ ∆)˜ ,

˜ n×n for A ∈ GL(n) and ∆, ∆ ∈ TAGL(n) = R . 20 In many applications, it is more appropriate to consider metrics with certain invariance properties.17 A left-invariant metric can be obtained from the standard metric via

−1 −1 h∆, ∆˜ iA = hA ∆,A ∆˜ i0,A ∈ GL(n), ∆, ∆˜ ∈ TAGL(n). (4.1)

When formally considering ∆ = AV, ∆˜ = AV˜ ∈ TAGL(n) = Agl(n) as left-translates of tangent vectors V, V˜ ∈ TI GL(n) = gl(n), then this metric satisfies h∆, ∆˜ iA = hV, V˜ i0. Alternatively, hV, V˜ i0 = hAV, AV˜ iA, which explains the name ‘left-invariant’. The Riemannian exponential and logarithm for the flat metric. When equipped with the Euclidean metric, GL(n) is flat: since the tangent space is the full matrix space Rn×n, the geodesic equation (2.3) requires the acceleration of a geodesic curve to vanish completely. Hence, the geodesic that starts from A ∈ GL(n) with velocity ∆ ∈ Rn×n is the straight line C(t) = A + t∆. Note that the curve t 7→ C(t) may leave the manifold GL(n) for some t ∈ R as it may hit a matrix with zero determinant. The formulae for the Riemannian exponential and logarithm mapping at a base point A ∈ GL(n) are

GL ˜ ExpA :TAGL(n) ⊃ Bε(0) → GL(n), ∆ 7→ A := A + ∆, (4.2) GL ˜ ˜ LogA : GL(n) → TAGL(n), A 7→ ∆ := (A − A). (4.3)

In (4.2), Bε(0) denotes a suitably small open neighborhood around 0 ∈ TAGL(n) ' n×n R such that A + ∆ ∈ GL(n) for all ∆ ∈ Bε(0). The Riemannian exponential for the left-invariant metric on GL(n). The left- invariant metric induces a non-flat geometry on GL(n). Formulae for the covariant derivatives and the corresponding geodesics are derived in [10, Thm. 2.14]. The counterparts w.r.t. the right-invariant metrics can be found in [96]. Given a base point A ∈ GL(n) and a starting velocity ∆ = AV ∈ TAGL(n) = Agl(n), the associated geodesic is

T T ΓA,∆ : t 7→ A expm(tV ) expm(t(V − V )). (4.4) The Riemannian exponential is

GL T T ExpM (∆) = ΓA,∆(1) = A expm(V ) expm(V − V ) −1 T −1 −1 T = A expm((A ∆) ) expm((A ∆) − (A ∆) ). (4.5) The author is not aware of a closed formula for the inverse map, i.e., the Riemannian logarithm for the left-invariant metric, see also the discussion in [96, §4.5]. The thesis [82, §6.2] introduces a Riemannian shooting method for computing the Riemannian logarithm w.r.t. the left-invariant metric. An important special case. For tangent vectors ∆ = AV ∈ TAGL(n) with normal V ∈ Rn×n, i.e., VV T = V T V , it holds that the matrices V T and (V − V T ) commute. T T T T Therefore, according to (A.2), A expm(V ) expm(V −V ) = A expm(V +V −V ) = A expm(V ) and the Riemannian exponential reduces to GL −1 ˜ −1 ExpA : TAGL(n) ∩ {∆|A ∆ normal} → GL(n), ∆ 7→ A = A expm(A ∆).

17“Eulerian of a can be described as motion along geodesics in the group of rotations of three-dimensional euclidean space provided with a left-invariant Riemannian metric. A significant part of Euler’s theory depends only upon this invariance, and therefore can be extended to other groups.”[11, Appendix 2, p. 318] 21 The Riemannian logarithm is

GL ˜ −1 ˜ ˜ −1 ˜ LogA : DA ∩ {A|A A normal} → TAGL(n), A 7→ ∆ = A logm(A A),

where DA ⊂ GL(n) is a domain such that a suitable branch of the matrix logarithm is well-defined. These expressions are sometimes encountered in the literature as the Riemannian exponential and logarithm mappings. Yet, one should be aware of the fact that they hold under special circumstances. 4.2. The orthogonal group. This section is devoted to the orthogonal group O(n) ⊂ Rn×n of orthogonal n-by-n matrices. In parametric model reduction, such matrices may appear as eigenvector matrices in symmetric EVD problems. 4.2.1. Introduction and data representation in numerical schemes. The orthogonal group is O(n) = {Q ∈ Rn×n| QQT = I = QT Q}. The manifold structure of O(n) can be established via Theorem 2.2, see also Example 1. The orthogonal group decomposes into two connected components, namely the orthogonal matrices with determinant 1 and the orthogonal matrices with determinant −1. The former constitute the special orthogonal group SO(n) = {Q ∈ O(n)| det(Q) = 1}. The orthogonal group is a closed subgroup of the Lie group GL(n) and thus itself a Lie group (Section 2.5). The tangent space TI O(n) at the identity forms the Lie algebra associated with the Lie group O(n). It coincides with the Lie algebra of SO(n) and as such is denoted by so(n) = TI SO(n) = TI O(n), [43, §3.3, 3.4]. The Lie algebra of SO(n) is precisely the vector space of skew-symmetric matrices, so(n) = skew(n). According to (2.6), the tangent space at an arbitrary location Q is given by the translates (by left-multiplication) of the Lie algebra

 n×n TQO(n) = Qso(n) = ∆ = QV ∈ R | V ∈ skew(n) ,

 n×n T T which is the same as ∆ ∈ R | Q ∆ = −∆ Q . The Lie exponential is

expm |so(n) : so(n) → SO(n). (4.6)

This restriction is a surjective map, see Appendix A. The dimensions of both TQO(n) 1 and O(n) are 2 n(n − 1). 4.2.2. Distances and geodesics. We follow up on the discussion in Section 4.1.1. For the orthogonal group, the Euclidean metric and the left-invariant metric coincide: Let ∆ = QV, ∆˜ = QV˜ ∈ TQO(n) = Qso(n). Then,

−1 −1 h∆, ∆˜ iQ = hQ ∆,Q ∆˜ i0 = hV, V˜ i0 T T T = trace(V V˜ ) = trace(V Q QV˜ )= h∆, ∆˜ iI .

In fact, the metric is also right-invariant, which makes it a bi-invariant metric, see [6, §2]. Bi-invariant metrics are important, because for Lie groups endowed with bi- invariant metrics, the Lie exponential map and the Riemannian exponential map at the identity coincide [6, Thm. 2.27, p. 40]. The Riemannian exponential and logarithm maps on O(n). The Riemannian O(n)-exponential at a base point Q ∈ O(n) sends a tangent vector ∆ ∈ TQO(n) to the endpoint Q˜ ∈ O(n) of a geodesic that starts from Q with velocity vector ∆. Therefore, it provides at the same time an expression for the geodesic curves on O(n). 22 A formula for computing the Riemannian O(n)-exponential was derived in [33, §2.2.2]. Given Q ∈ O(n), it holds

On ˜ T ExpQ : TQO(n) → O(n), ∆ 7→ Q := Q expm(Q ∆). (4.7)

This result is also immediate from abstract Lie theory, see [6, Eq. (2.2) & Thm. 2.27].18 The corresponding Riemmanian logarithm on O(n) is

On ˜ T ˜ LogQ : O(n) ⊃ DQ → TQO(n), Q 7→ ∆ := Q logm(Q Q) (4.8) and is well defined on a neighborhood DQ ⊂ O(n) around Q such that for all Q˜ ∈ Dp, the orthogonal matrix QT Q˜ does not feature λ = −1 as an eigenvalue. The Riemannian distance between orthogonal matrices. For given Q, Q˜ ∈ O(n) from the same connected component of O(n), consider the EVD QT Q˜ = ΨΛΨH . Because QT Q˜ is orthogonal, it holds Λ = diag(eiθ1 , . . . , eiθn ) and we assume that θ1, . . . , θn ∈ (−π, π). The Riemannian distance is

1 n ! 2 ˜ On ˜ X 2 distOn(Q, Q) = k LogQ (Q)kQ = k logm(Λ)kF = θk . k=1 The compact Lie group SO(n) is a geodesically complete Riemannian manifold [6, Hopf-Rinow-Theorem, p. 31], and each two points of SO(n) can be joined by a minimal geodesic. 4.3. The matrix manifold of symmetric positive definite matrices. This section is devoted to the matrix manifold SPD(n) of real, symmetric positive-definite n-by-n matrices. In model reduction, such matrices appear for example as (reduced) system matrices in second-order parametric ODEs. For example, in linear structural or electrical dynamical systems, mass, stiffness and damping matrices are usually in SPD(n), [9, §4.2]. Moreover, positive definite matrices arise as Gramians of reachable and observable LTI systems in the context of balanced truncation [17]. Related is the manifold of positive semi-definite matrices of fixed rank. It is investigated in [20, 96, 64]. An application in the context of model reduction features in [65]. 4.3.1. Introduction and data representation in numerical schemes. The set

T n SPD(n) = {A ∈ sym(n)| x Ax > 0 ∀x ∈ R \{0}}

is an open subset of the metric (sym(n), h·, ·i0) of symmetric matrices. As such, it is a differentiable manifold [19, §6]. Moreover, it forms a convex cone [34, Example 2, p. 8], [68, §2.3], and can be realized as a quotient SPD(n) ' GL(n)/O(n). The latter is based on the fact that for A ∈ SPD(n), matrix factorizations A = ZZT

18 The Lie exponential is expm |so(n) : so(n) → SO(n), which is in the case at hand the Riemannian SO exponential at the identity, ExpI = expm |so(n). This translates to any other location via [6, Eq. (2.2)] as follows: Pick any Q ∈ SO(n) and consider the mapping “left-multiplication by Q”, i.e., L : SO(n) → SO(n),P 7→ QP . Then, the differential is d(L ) : T SO(n) → T SO(n),V 7→ Q Q I I LQ(I) ∆ := QV . Because LQ is an isometry, Q ExpSO(V ) = L (ExpSO(V )) = ExpSO (d(L ) (V )) = ExpSO(QV ), I Q I LQ(I) Q I Q

SO SO −1 which gives ExpQ (QV ) = Q ExpI (V ) = Q expm(Q ∆) and thus (4.7). 23 with Z ∈ GL(n) are invariant under orthogonal transformations Z 7→ ZQ, Q ∈ O(n), [20, §2, p.3]. Since SPD(n) is an open subset of the vector space sym(n), the tangent space is simply

TASPD(n) = sym(n). (4.9)

1 The dimensions of both TASPD(n) and SPD(n) are 2 n(n + 1). There is a smooth one-to-one correspondence between sym(n) and SPD(n). That is, every positive definite matrix can be written as the matrix exponential of a unique , [36, Lem. 18.7, p. 472]. Put in different words, when restricted to sym(n), the standard matrix exponential

expm : sym(n) → SPD(n)

is a diffeomorphism, its inverse is the standard principal matrix logarithm

logm : SPD(n) → sym(n),

see also [12, Thm. 2.8]. The group GL(n) acts on SPD(n) via congruence transfor- mations

T gX (A) = X AX, X ∈ GL(n),A ∈ SPD(n). (4.10)

For additional background on SPD(n), see [69, 70, 78]. Applications in are presented in [28, 56]. 4.3.2. Distances and geodesics. The literature knows a large variety of dis- tance measures on SPD(n), see [53, Table 3.1, p. 56]. Yet, there are essentially two choices that are associated with inner products on the tangent space of SPD(n) and thus induce Riemannian on the manifold SPD(n): the so-called natural metric and the log-Euclidean metric. Let A ∈ SPD(n) and let ∆, ∆˜ ∈ sym(n) be two tangent vectors. • The natural metric is

−1/2 −1/2 −1/2 −1/2 −1 −1 h∆, ∆˜ iA = hA ∆A ,A ∆˜ A i0 = trace(A ∆A ∆)˜ ,

see [19, §6, p. 201], [20]. It also goes by the name trace matric, [61, §XII.1, p.322]. In statistical applications, it is usually called the affine-invariant metric [67, 79].19 • The log-Euclidean metric is

˜ ˜ h∆, ∆iA = hD(logm)A(∆),D(logm)A(∆)i0,

see [12, eq. (3.5)]. For the natural metric, it is more appropriate to consider sym(n) = TI SPD(n) as the tangent space at the identity and the tangent space at an arbitrary location 1/2 1/2 A ∈ SPD(n) as TASPD(n) = A (TI SPD(n)) A , which, of course, is nothing

19 The motivation is as follows: if y = Ax + v0, A ∈ GL(n) is an affine transformation of a random vector x, then the mean is transformed toy ¯ := Ax¯ + v0 and the covariance matrix undergoes a T T congruence transformation Cyy = E[(y − y¯)(y − y¯) ] = ACxxA . 24 but a reparameterization of sym(n). From this perspective, we have for tangent vectors ∆ = A1/2VA1/2, ∆˜ = A1/2VA˜ 1/2 that

h∆, ∆˜ iA = hV, V˜ i0.

The congruence transformations (4.10) are of SPD(n) with respect to the natural metric, [61, Thm. XII.1.1, p. 324], [19, Lem. 6.1.1, p. 201]. See also the discussion in [79, §3]. By a standard pullback construction from differential geometry [32, Def. 2.2, Example 2.5], the log-Euclidean metric transfers the inner product h·, ·i0 on sym(n) to SPD(n) via the matrix logarithm logm : SPD(n) → sym(n). In [12, eq. (3.5)], the authors take this construction one step further and use the expm-logm-correspondence to define a multiplication that turns SPD(n) into a Lie group and, eventually, into a vector space. As such, it is a flat manifold, i.e. a Riemannian manifold with zero curvature. In this way, the computational challenges that come with dealing with data on nonlinear manifolds are circumvented. Which metric is to be preferred is problem-dependent, see the various contri- butions in [92] and [66]. Since the natural metric arises canonical both from the geometric approach, [61, §XII.1], and the matrix-algebraic approach [19, §6] and since staying with the standard matrix multiplication is consistent with the setting of solv- ing dynamical systems in model reduction applications, we restrict the discussion of the Riemannian exponential and logarithm to the geometry that is based on the natural metric. The SPD(n) exponential. The Riemannian SPD(n)-exponential at a base point A ∈ SPD(n) sends a tangent vector ∆ to the endpoint A˜ ∈ SPD(n) of a geodesic that starts from A with velocity vector ∆. Therefore, it provides at the same time an expression for the geodesic curves on SPD(n) with respect to the natural metric. Formulae for computing the SPD(n)-exponential can be found in [20], [79]. Readers 1 preferring a matrix-analytic approach are referred to [19, §6]. Here, A 2 denotes the

Algorithm 5 Riemanian SPD(n)-exponential

Input: base point A ∈ SPD(n), tangent vector ∆ ∈ TASPD(n) = sym(n) 1  1 1  1 ˜ SPD − − Output: A := ExpA (∆) = A 2 expm A 2 ∆A 2 A 2 .

matrix square root of A, see Appendix A. The SPD(n) logarithm. The Riemannian SPD(n)-logarithm at a base point A ∈ SPD(n) finds for another point A˜ ∈ SPD(n) an SPD(n)-tangent vector ∆ such that the geodesic that starts from A with velocity ∆ reaches A˜ after an arc length of p k∆kA = h∆, ∆iA. Therefore, it provides for two given data points A, A˜ ∈ SPD(n) • a solution to the geodesic endpoint problem: a geodesic that starts from A and ends at A˜. • the Riemannian distance between the given points A, A˜. Formulae for computing the SPD(n)-logarithm can be found in [20], [79]. Both

Algorithm 6 Riemanian SPD(n)-logarithm Input: base point A ∈ SPD(n), location A˜ ∈ SPD(n) 1  1 1  1 SPD ˜ − ˜ − Output: ∆ := LogA (A) = A 2 logm A 2 AA 2 A 2 .

25 Algorithms 5 and 6 require to compute the spectral decomposition of n-by-n-matrices. The computational effort is O(n3). In the context of parametric model reduction, the Riemannian exponential and logarithm maps are usually required for reduced matrix operators [9]. If n denotes the dimension of the full state vectors and r  n denotes the dimension of the reduced state vectors, then matrix exponentials for r-by-r-matrices are required, so that the computational effort reduces to O(r3). 4.4. The Stiefel manifold. This section is devoted to the Stiefel manifold St(n, r) ⊂ Rn×r of rectangular column-orthogonal n-by-r matrices, r ≤ n. Points U ∈ St(n, r) may be considered as orthonormal bases of cardinality r, or r-frames in Rn. In model reduction, such matrices appear as orthogonal coordinate systems for low-order ansatz spaces that usually stem from a proper orthogonal decomposition or a singular value decomposition of given input solution data. Modeling data on the Stiefel manifold corresponds to data processing for orthonormal bases and thus allows for example for interpolation/parameterization of POD subspace bases. The most important use case in model reduction is where the Stiefel matrices are tall and skinny, i.e., r  n. Interpolation problems on the Stiefel manifold have not yet been considered in the model reduction context. The reference [59] discusses interpolation of Stiefel data, however with using quasi-geodesics rather than geodesics. The work [103] includes numerical experiments for interpolating orthogonal frames on the Stiefel manifold that relies the canonical Riemannian Stiefel logarithm [82, 102]. 4.4.1. Introduction and data representation in numerical schemes. The Stiefel manifold is the compact, homogeneous matrix manifold of column-orthogonal matrices

n×r T St(n, r) := {U ∈ R | U U = Ir}. The manifold structure can be directly established via Theorem 2.2 in a similar way as in Example 1. An alternative approach is via Example 3, where St(n, r) is iden- ∼ tified with the quotient space St(n, r) = O(n)/(Ir × O(n − r)) under actions of the I 0   closed subgroup I × O(n − r) := r | R ∈ O(n − r) ≤ O(n). Two square r 0 R orthogonal matrices in O(n) are identified as the same point on St(n, r), if their first r columns coincide, see [33, §2.4]. For any matrix representative U ∈ St(n, r), the tangent space of St(n, r) at U is represented by

 n×r T T n×r TU St(n, r) = ∆ ∈ R | U ∆ = −∆ U ⊂ R .

Every tangent vector ∆ ∈ TU St(n, r) may be written as

T r×r n×r ∆ = UA + (I − UU )T,A ∈ R skew,T ∈ R arbitrary, (4.11) ⊥ r×r (n−r)×r ∆ = UA + U B,A ∈ R skew,B ∈ R arbitrary, (4.12) where in the latter case, U ⊥ ∈ St(n, n − r) is such that (U, U ⊥) ∈ O(n) is a square 1 orthogonal matrix. The dimension of both TU St(n, r) and St(n, r) is nr − 2 r(r + 1). For additional background and applications, see [3, 18, 26, 33, 49, 95]. 4.4.2. Distances and geodesics. Let U ∈ St(n, r) be a point and let ∆ = ⊥ ⊥ UA+U B, ∆˜ = UA˜+U B˜ ∈ TU St(n, r) be tangent vectors. There are two standard metrics on the Stiefel manifold. 26 • The Euclidean metric on TU St(n, r) is the one inherited from the ambient Rn×r:

T T T h∆, ∆˜ i0 = trace(∆ ∆)˜ = trace A A˜ + trace B B˜

• The canonical metric on TU St(n, r)

 1  1 h∆, ∆˜ i = trace ∆T (I − UU T )∆˜ = trace AT A˜ + trace BT B˜ U 2 2

is derived from the quotient representation St(n, r) = O(n)/(Ir × O(n − r)) of the Stiefel manifold. The canonical metric counts the independent coordinates20 of a tangent vector equally, p ⊥ when measuring the length h∆, ∆iU of a tangent vector ∆ = UA + U B, while the Euclidean metric disregards the skew-symmetry of A [33, §2.4]. Recall that different metrics entail different measures for the lengths of curves and thus different formulae for geodesics. The Stiefel exponential. The Riemannian Stiefel exponential at a base point U ∈ St(n, r) sends a Stiefel tangent vector ∆ to the endpoint U˜ ∈ St(n, r) of a geodesic that starts from U with velocity vector ∆. Therefore, it provides at the same time an expression for geodesic curves on St(n, r). A closed-form expression for the Stiefel exponential w.r.t. Euclidean metric is included in [33, §2.2.2],

 T T    ˜ St U ∆ −∆ ∆ Ip T U = ExpU (∆) = (U, ∆) expm T expm(−U ∆). Ip U ∆ 0

In [50], an alternative formula is derived that features only matrix exponentials of skew-symmetric matrices. An efficient algorithm for computing the Stiefel exponen- tial w.r.t. the canonical metric was derived in [33, §2.4.2]: In applications, where

Algorithm 7 Stiefel exponential [33].

Input: base point U ∈ St(n, r), tangent vector ∆ ∈ TU St(n, r) 1: A := U T ∆ # horizontal component, skew 2: QR := ∆ − UA # (thin) qr-decomp. of normal component of ∆.  A −RT  3: = T ΛT H ∈ 2r×2r # R 0 R EVD of skew-symmetric matrix     M Ir 4: := T exp (Λ)T H ∈ 2r×r N m 0 R ˜ St Output: U := ExpU (∆) = UM + QN ∈ St(n, r)

St ExpU (µ∆) needs to be evaluated for various parameters µ as in in the example of Section 3.4, steps 1.–3. should be computed a priori (offline). Apart from elementary matrix multiplications, the algorithm requires to compute the standard matrix expo- nential of a skew-symmetric matrix. This however, is for a 2r-by-2r-matrix and does not scale in the dimension n. With the usual assumption of model reduction that n  p, the computational effort is O(nr2).

20i.e., the upper triangular entries of the skew-symmetric A and the entries of B of ∆ = UA+U ⊥B 27 The Stiefel logarithm. The Riemannian Stiefel logarithm at a base point U ∈ St(n, r) finds for another point U˜ ∈ St(n, r) a Stiefel tangent vector ∆ such that the geodesic that starts from U with velocity ∆ reaches U˜ after an arc length of p k∆kU = h∆, ∆iU . Therefore, it provides for two given data points U, U˜ ∈ St(n, r) • a solution to the geodesic endpoint problem: a geodesic that starts from U and ends at U˜. • the Riemannian distance between the given points U, U˜. An efficient algorithm for computing the Stiefel logarithm w.r.t. the canonical metric was derived in [102]. The analysis in [102] shows that the algorithm is guar-

Algorithm 8 Stiefel logarithm [102]. Input: base point U ∈ St(n, r), U˜ ∈ St(n, r) ‘close’ to base point, τ > 0 convergence threshold 1: M := U T U˜ ∈ Rr×r 2: QN := U˜ − UM ∈ Rn×r # (thin) qr-decomp. of normal component of U˜     MX0 M 3: V0 := ∈ O(2r) # compute orth. completion of the block NY0 N 4: for k = 0, 1, 2,... do  T  Ak −Bk 5: := logm(Vk) # matrix log of orth. matrix Bk Ck 6: if kCkk2 ≤ τ then 7: break 8: end if 9: Φk := exp (−Ck) # matrix exp of skew matrix m   Ir 0 10: Vk+1 := VkWk, where Wk := 0 Φk 11: end for St ˜ Output: ∆ := LogU (U) = UAk + QBk ∈ TU St(n, r)

anteed to converge if the input data points U, U˜ are at most a Euclidean distance of d = kU − U˜k2 ≤ 0.09 apart. In this case, the algorithm exhibits a linear rate 1 of convergence that depends on d but is smaller than 2 . In practice, the algorithm seems to converge, whenever the initial V0 is such that its standard matrix logarithm logm(V0) is well-defined. Note that two points on St(n, r) can at most be a Euclidean distance of 2 away from each other. Apart from elementary matrix multiplications, the algorithm requires to compute the standard matrix logarithm of an orthogonal 2r-by-2r-matrix and the standard matrix exponential of a skew-symmetric r-by-r-matrix at every iteration k. Yet, these operations are independent of the dimension n. With the usual assumption of model reduction that r  n, the computational effort is O(nr2). For the Stiefel manifold equipped with the Euclidean metric, methods for calcu- lating the Stiefel logarithm are introduced in [22]. 4.5. The Grassmann manifold. This section is devoted to the Grassmann manifold Gr(n, r) of r-dimensional subspaces of Rn for r ≤ n. Every point U ∈ Gr(n, r), i.e., every subspace may be represented by selecting a basis {u1, . . . , ur} with ran(u1, . . . , ur) = U. In numerical schemes, we work exclusively with orthonormal bases. In this way, points U on the Grassmann manifold are to be represented by points U ∈ St(n, r) on the Stiefel manifold via U = ran(U). For details and theoretical 28 background, see the references [2, 3, 33]. Subspaces and Grassmann manifolds play an important role in projection-based parametric model reduction, [8, 73, 100, 86] and in Krylov subspace approaches [17]. Modeling data on the Grassmann manifold corresponds to data processing for subspaces and thus allows for example for the interpolation/parameterization of POD subspaces. The most important use case in model reduction is where the subspaces are of low dimension when compared to the surrounding state space, i.e., n  p.

4.5.1. Introduction and data representation in numerical schemes. The set of all r-dimensional subspaces U ⊂ Rn forms the Grassmann manifold

n Gr(n, r) := {U ⊂ R | U subspace, dim(U) = r}.

The Grassmann manifold is a quotient of O(n) under the action of the Lie subgroup S 0   O(r) × O(n − r) = | S ∈ O(r),R ∈ O(n − r) ≤ O(n). Two matrices 0 R Q, Q˜ ∈ O(n) are in the same (O(r) × O(n − r))-orbit, if and only if the first r columns of Q and Q˜ span the same subspace and the tailing n − r columns span the corresponding orthogonal complement subspace. Theorem 2.11 applies and shows that Gr(n, r) = O(n)/(O(r) × O(n − r)) is a homogeneous manifold. Alternatively, the Grassmann manifold can be realized as a quotient manifold of the Stiefel manifold with the help of Theorem 2.9,

Gr(n, r) = St(n, r)/O(r) = {[U]| U ∈ St(n, r)}, (4.13) where the O(r)-orbits are [U] = {UR| R ∈ O(r)}. A matrix U ∈ St(n, r) is called a matrix representative of a subspace U ∈ Gr(n, r), if U = ran(U). The orbit [U] and the subspace U = ran(U) are to be considered as the same object. For any matrix representative U ∈ St(n, r) of U ∈ Gr(n, r) the tangent space of Gr(n, r) at U is represented by

 n×r T n×r TU Gr(n, r) = ∆ ∈ R | U ∆ = 0 ⊂ R .

Every tangent vector ∆ ∈ TU Gr(n, r) may be written as

T n×r ∆ = (I − UU )T,T ∈ R arbitrary, or, (4.14) ⊥ (n−r)×r ∆ = U B,B ∈ R arbitrary, (4.15) where in the latter case, U ⊥ ∈ St(n, n − r) is such that (U, U ⊥) ∈ O(n) is a square 2 orthogonal matrix. The dimension of both TU Gr(n, r) and Gr(n, r) is nr − r .

4.5.2. Distances and geodesics. A metric on TU Gr(n, r) can be obtained via making use of the fact that the Grassmannian is a quotient of the Stiefel manifold. Al- T ternatively, one can restrict the standard inner matrix product hA, Bi0 = trace(A B) to the Grassmann tangent space. In the case of the Grassmannian, both approaches lead to the same metric

T h∆, ∆˜ iU = trace(∆ ∆)˜ = h∆, ∆˜ i0, see [33, §2.5]. 29 Algorithm 9 Grassmann exponential [33]. Input: base point U = [U] ∈ Gr(n, r), where U ∈ St(n, r), tangent vector ∆ ∈ TU Gr(n, r) SVD 1: QΣV T := ∆, with Q ∈ St(n, r)# (thin) SVD of tangent vector 2: U˜ := UV cos(Σ)V T + Q sin(Σ)V T # cos and sin act only on diag. entries. ˜ Gr ˜ Output: U := ExpU (∆) = [U] ∈ Gr(n, r).

The Grassmann exponential. The Riemannian Grassmann exponential at a base point U ∈ Gr(n, r) sends a Grassmann tangent vector ∆ to the endpoint U˜ ∈ Gr(n, r) of a geodesic that starts from U with velocity vector ∆. Therefore, it provides at the same time an expression for the geodesic curves on Gr(n, r). An efficient algorithm for computing the Grassmann exponential was derived in [33, §2.5.1]: Apart from elementary matrix multiplications, the algorithm requires to compute the singular value decomposition of an n-by-r-matrix. The computational effort is O(nr2). The Grassmann logarithm. The Riemannian Grassmann logarithm at a base point U ∈ Gr(n, r) finds for another point U˜ ∈ Gr(n, r) a Grassmann tangent vector ∆ such that the geodesic that starts from U with velocity ∆ reaches U˜ after an arc length of q C ˜ k∆kU = gU (∆, ∆). Therefore, it provides for two given data points U, U ∈ Gr(n, r) • a solution to the geodesic endpoint problem: a geodesic that starts from U and ends at U˜. • the Riemannian distance between the given points U, U˜. An algorithm for computing the Grassmann logarithm is stated implicitly in [2, §3.8, p. 210]. The reference [37] features expressions for the Grassmann exponential and the corresponding logarithm that formally work with Grassmann representatives in SO(n)/(SO(r) × SO(n − r)) but also keep the computational effort O(nr2). The reference [81, §4.3] gives the corresponding mappings after identifying subspaces with Gr Gr orthoprojectors, see also [16]. The composition Exp[U] ◦ Log[U] is the identity on

Algorithm 10 Grassmann Logarithm. Input: base point U = [U] ∈ G(n, r) with U ∈ St(n, r), U˜ = [U˜] ∈ G(n, r) with U˜ ∈ St(n, r). 1: M := U T U˜ 2: L := (I − UU T )UM˜ −1 = UM˜ −1 − U SVD 3: QΣV T := L # (thin) SVD 4: ∆ := Q arctan(Σ)V T # arctan acts only on diag. entries. Gr ˜ Output: ∆ = LogU (U) ∈ TU G(n, r)

Gr(n, r), wherever it is defined. Yet, on the level of the actual matrix representatives, the operation

Gr Gr ˜ ˜ (Exp[U] ◦ Log[U])([Uin]) = [Uout]

produces a matrix U˜out 6= U˜in. Directly recovering the input matrix can be achieved via a Procrustes-type preprocessing step, where U˜ is replaced with U˜∗ := U˜Φ, Φ = ˜ arg minΦ∈O(r) kU − UΦk. This leads to: An additional advantage of the modified 30 Algorithm 11 Grassmann Logarithm: modified version. Input: base point U = [U] ∈ G(n, r) with U ∈ St(n, r), U˜ = [U˜] ∈ G(n, r) with U˜ ∈ St(n, r). SVD 1: ΨSRT := U˜ T U T 2: U˜∗ := U˜(ΨR ) # ‘Transition to Procrustes representative’ T 3: L := (I − UU )U˜∗ SVD 4: QΣV T := L # (thin) SVD 5: ∆ := Q arcsin(Σ)V T # arcsin acts only on diagonal entries. Gr ˜ Output: ∆ = LogU (U) ∈ TU G(n, r)

Grassmann logarithm is that the matrix inversion M −1 = (U T U˜)−1 is avoided. In fact, it is replaced by the SVD ΨSRT = U˜ T U that is used to solve the Procrustes ˜ T ˜ problem minΦ∈O(r) kU − UΦk. The SVD exists also if U U does not have full rank. Distances between subspaces. The Riemannian logarithm provides the distance between two subspaces U = [U], U˜ = [U˜] ∈ Gr(n, r) as follows: First, compute Gr ˜ ˜ ∆ = LogU (U), then compute k∆kU = distGr(U, U). In practice, however, this boils down to computing the singular values of the matrix M = U T U˜, which can be seen as 2 T Pp 2 follows. By Algorithm 11, k∆kU = trace(∆ ∆) = k=1 arcsin(σk) , where the σk’s T are the singular values of L = (I − UU )U˜∗. These match precisely the square roots of the eigenvalues of LT L. Using the SVD of the square matrix U˜ T U = ΨSRT as in steps 1&2 of Algorithm 11, the eigenvalues of LT L can be read off from T ˜ T T ˜ 2 T 2 T L L = U∗ (I − UU )U∗ = I − RS R = R(I − S )R , 2 2 p 2 so that σk = 1 − sk, when consistently ordered. As a consequence, sk = 1 − σk = cos(arcsin(σk)), which implies

1 1 p ! 2 p ! 2 X 2 X 2 distGr(U, U˜) = arcsin(σk) = arccos(sk) , (4.16) k=1 k=1

T where σ1, . . . , σr and s1, . . . , sr are the singular values of L and U˜ U, respectively. The numerical literature knows a variety of distance measures for subspaces. Essentially, all of them are based on the principal [33, §2.5.1, §4.3]. π The principal angles (or canonical angles) θ1, . . . , θr ∈ [0, 2 ] between two subspaces [U], [U˜] ∈ Gr(n, r) are defined recursively by

T T cos(θk) := uk vk := max max u v. u ∈ [U], kuk = 1 v ∈ [U˜], kvk = 1 u⊥u1, . . . , uk−1 v⊥v1, . . . , vk−1

π The principal angles can be computed via θk := arccos(sk) ∈ [0, 2 ], where sk is the kth singular value of U T U˜ ∈ Rr×r [39, §6.4.3]. Hence, the Riemannian subspace distance (4.16) expressed in terms of the principal angles is precisely ˜ r dist([U], [U]) := kΘk2, Θ = (θ1, . . . , θr) ∈ R . (4.17) In particular, (4.17) shows that any two points on Gr(n, r) can be connected by a √ r geodesic of length at most 2 π, see also [98, Thm 8(b)]. Appendix A. Appendix. 31 The matrix exponential and logarithm. The standard matrix exponential and ma- trix logarithm are defined via the ∞ ∞ X Xj X (X − I)j exp (X) := , log (X) := (−1)j+1 . (A.1) m j! m j j=0 j=1

n×n For X ∈ R , expm(X) is invertible with inverse expm(−X). The following restric- tions of the exponential map are important:

expm |sym(n) : sym(n) → SPD(n), expm |skew(n) : skew(n) → SO(n). The former is a diffeomorphism [78, Thm. 2.8], the latter is a differentiable, surjective map [38, §. 3.11, Thm. 9]. For additional properties and efficient methods for numerical computation, see [47, §10, 11]. A few properties of the exponential function for real or complex numbers carry over to the matrix exponential. However, since matrices do not commute, the standard exponential law is replaced by

expm(Z(X,Y )) = expm(X) expm(Y ), (A.2) 1 Z(X,Y ) = X + Y + [X,Y ] + 2 1 1 ([X, [X,Y ]] + [Y, [Y,X]])) − [Y, [X, [X,Y ]]]..., 12 24 where [X,Y ] = XY −YX is the commutator bracket, or Lie bracket. This is Dynkin’s formula for the Baker-Campbell-Hausdorff series, see [85, §1.3, p. 22]. From a theo- retical point of view, it is important that all terms in this series can be expressed in terms of the Lie bracket. A special case is

expm(X + Y ) = expm(X) expm(Y ), if [X,Y ] = 0. Matrix square roots and the polar decomposition. Every S ∈ SPD(n) has a unique 1 1 1 matrix square root in SPD(n), i.e., a matrix denoted by S 2 with the property S 2 S 2 = S. This square root can be obtained via an EVD S = QΛQT by setting √ 1 T S 2 := Q ΛQ ,

where Q ∈ O(n), Λ = diag(λ1, . . . , λn) and λi > 0 are the eigenvalues of S. Every A ∈ GL(n) can be uniquely decomposed into an orthogonal matrix times a symmetric positive definite matrix,

A = QP = Q expm(X),Q ∈ O(n),P ∈ SPD(n),X ∈ sym(n). The polar factors can be constructed via taking the square root of the assuredly T T 1 −1 positive definite matrix A A and subsequently setting P := (A A) 2 and Q := AP . Because the restriction of expm to the symmetric matrices is a diffeomorphism onto SPD(n), there is a unique X ∈ sym(n) with P = expm(X). For details, see [43, Thm. 2.18]. The Procrustes problem. Let A, B ∈ Rn×r. The Procrustes problem aims at finding an orthogonal transformation R∗ ∈ O(r) such that R∗ is the minimizer of

min kA − BRkF . R∈O(r)

SVD The optimal R∗ is R∗ = UV T , where BT A = UΣV T ∈ Rr×r, see [39]. 32 REFERENCES

[1] P.-A. Absil, P.-Y. Gousenbourger, P. Striewski, and B. Wirth, Differentiable piecewise- B´eziersurfaces on Riemannian manifolds, SIAM Journal on Imaging Sciences, 9 (2016), pp. 1788–1828. [2] P.-A. Absil, R. Mahony, and R. Sepulchre, Riemannian geometry of Grassmann manifolds with a view on algorithmic computation, Acta Applicandae Mathematica, 80 (2004), pp. 199–220. [3] , Optimization Algorithms on Matrix Manifolds, Princeton University Press, Princeton, New Jersey, 2008. [4] B. Afsari, R. Tron, and R. Vidal, On the convergence of gradient descent for finding the Riemannian center of mass, SIAM Journal on Control and Optimization, 51 (2013), pp. 2230–2260. [5] D. Alekseevsky, A. Kriegl, P. W. Michor, and M. Losik, Choosing roots of polynomials smoothly, Israel Journal of Mathematics, 105 (1998), pp. 203–233. [6] M. M. Alexandrino and R. G. Bettiol, Lie Groups and Geometric Aspects of Isometric Actions, Springer International Publishing, Cham, 2015. [7] D. Amsallem, Interpolation on Manifolds of CFD-based Fluid and Finite Element-based Structural Reduced-order Models for On-line Aeroelastic Prediction, PhD thesis, Stanford University, 2010. [8] D. Amsallem and C. Farhat, Interpolation method for adapting reduced-order models and application to aeroelasticity, AIAA Journal, 46 (2008), pp. 1803–1813. [9] , An online method for interpolating linear parametric reduced-order models, SIAM Journal on Scientific Computing, 33 (2011), pp. 2169–2198. [10] E. Andruchow, G. Larotonda, L. Recht, and A. Varela, The left invariant metric in the general linear group, Journal of Geometry and , 86 (2014), pp. 241 – 257. [11] V. Arnol’d, Mathematical Methods of , Graduate Texts in Mathematics, Springer, New York, 1997. [12] V. Arsigny, P. Fillard, X. Pennec, and N. Ayache, Geometric means in a novel vec- tor space structure on symmetric positive-definite matrices., SIAM Journal on Matrix Analysis Applications, 29 (2006), pp. 328–347. [13] P. Astrid, S. Weiland, K. Willcox, and T. Backx, Missing points estimation in models described by proper orthogonal decomposition, IEEE Transactions on Automatic Control, 53 (2008), pp. 2237–2251. [14] M. Barrault, Y. Maday, N. Nguyen, and A. Patera, An “empirical interpolation” method: Application to efficient reduced-basis discretization of partial differential equa- tions, Comptes Rendus Math´ematique. Acad´emie des Sciences. Paris, I 339 (2004), pp. 667–672. [15] R. H. Bartels, J. C. Beatty, and B. A. Barsky, An Introduction to Splines for Use in Computer Graphics and Geometric Modeling, Morgan Kaufmann Series in Comp, Elsevier Science, 1995. [16] E. Batzies, K. Huper,¨ L. Machado, and F. Silva Leite, Geometric mean and geodesic regression on , Linear Algebra and Its Applications, 466 (2015), pp. 83– 101. [17] P. Benner, S. Gugercin, and K. Willcox, A survey of projection-based model reduction methods for parametric dynamical systems, SIAM Review, 57 (2015), pp. 483–531. [18] A. V. Bernstein and A. P. Kuleshov, Tangent bundle manifold learning via Grassmann & Stiefel eigenmaps, arXiv preprint arXiv:1212.6031, (2012). [19] R. Bhatia, Positive Definite Matrices, Princeton Series in Applied Mathematics, Princeton University Press, Princeton, New Jersey, 2007. [20] S. Bonnabel and R. Sepulchre, Riemannian metric and geometric mean for positive semidefinite matrices of fixed rank, SIAM Journal on Matrix Analysis and Applications, 31 (2009), pp. 1055–1070. [21] N. Boumal and P.-A. Absil, A discrete regression method on manifolds and its application to data on SO(n), IFAC Proceedings , 44 (2011), pp. 2284 – 2289. 18th IFAC World Congress. [22] D. Bryner, Endpoint geodesics on the Stiefel manifold embedded in Euclidean space, SIAM Journal on Matrix Analysis and Applications, 38 (2017), pp. 1139–1159. [23] M. D. Buhmann, Radial Basis Functions, vol. 12 of Cambridge Monographs on Applied and Computational Mathematics, Cambridge University Press, Cambridge, UK, 2003. [24] M. Camarinha, F. S. Leite, and P. Crouch, On the geometry of riemannian cubic polyno- mials, Differential Geometry and its Applications, 15 (2001), pp. 107 – 135. 33 [25] K. Carlberg and C. Farhat, A low-cost, goal-oriented ‘compact proper orthogonal decom- position’ basis for model reduction of state systems, International Journal for Numerical Methods in , 86 (2011), pp. 381–402. [26] R. Chakraborty and B. C. Vemuri, Statistics on the (compact) Stiefel manifold: Theory and applications. arXiv:1708.00045v1, 2017. [27] S. Chaturantabut and D. Sorensen, Nonlinear model reduction via discrete empirical interpolation, SIAM Journal on Scientific Computing, 32 (2010), pp. 2737–2764. [28] A. Cherian and S. Sra, Positive definite matrices: Data representation and applications in computer vision, in Algorithmic Advances in Riemannian Geometry and Applications: For Machine Learning, Computer Vision, Statistics, and Optimization, H. Q. Minh and V. Murino, eds., Springer International Publishing, Cham, 2016, pp. 93–114. [29] Y. Choi, D. Amsallem, and C. Farhat, Gradient-based constrained optimization using a database of linear reduced-order models, arXiv, arXiv:1506.07849v1 (2015), pp. 1–28. [30] P. Crouch and F. S. Leite, The dynamic interpolation problem: On Riemannian manifolds, Lie groups, and symmetric spaces, Journal of Dynamical and Control Systems, 1 (1995), pp. 177–202. [31] J. Degroote, J. Vierendeels, and K. Willcox, Interpolation among reduced-order matri- ces to obtain parameterized models for design, optimization and probabilistic analysis, International Journal for Numerical Methods in Fluids, 63 (2010), pp. 207–230. [32] M. P. do Carmo, Riemannian Geometry, Mathematics: Theory & Applications, Birkh¨auser Boston, 1992. [33] A. Edelman, T. A. Arias, and S. T. Smith, The geometry of algorithms with orthogonality constraints, SIAM Journal on Matrix Analysis and Applications, 20 (1998), pp. 303–353. [34] J. Faraut and A. Koranyi, Analysis on Symmetric Cones, Oxford Mathematical Mono- graphs, Oxford University Press, New York, 1994. [35] T. Franz, R. Zimmermann, S. Gortz,¨ and N. Karcher, Interpolation-based reduced-order modeling for steady transonic flows via manifold learning, International Journal of Com- putational Fluid Mechanics, Special Issue on Reduced Order Modeling, 228 (2014), pp. 106–121. [36] J. H. Gallier, Geometric methods and applications: for computer science and engineering, Texts in Applied Mathematics, Springer, New York, 2011. [37] K. A. Gallivan, A. Srivastava, X. Liu, and P. Van Dooren, Efficient algorithms for inferences on Grassmann manifolds, in IEEE Workshop on Statistical Signal Processing, 2003, pp. 315–318. [38] R. Godement and U. Ray, Introduction to the Theory of Lie Groups, Universitext, Springer International Publishing, 2017. [39] G. H. Golub and C. F. Van Loan, Matrix Computations, The John Hopkins University Press, Baltimore, 4th ed., 2013. [40] P.-Y. Gousenbourger, E. Massart, and P.-A. Absil, Data fitting on manifolds with com- posite B´ezier-like curves and blended cubic splines, Journal of Mathematical Imaging and Vision, online (2018), pp. 1–27. [41] P. Grohs, Quasi-interpolation in Riemannian manifolds, IMA Journal of Numerical Analysis, 33 (2013), pp. 849–874. [42] B. Haasdonk and M. Ohlberger, Efficient reduced models and a-posteriori error estimation for parametrized dynamical systems by offline/online decomposition, Mathematical and Computer Modelling of Dynamical Systems, 17 (2011), pp. 145–161. [43] B. C. Hall, Lie Groups, Lie Algebras, and representations: An elementary introduction, Springer Graduate texts in Mathematics, Springer–Verlag, New York – Berlin – Heidel- berg, 2nd ed., 2015. [44] A. Hay, J. Borggaard, I. Akhtar, and D. Pelletier, Reduced-order models for parame- ter dependent geometries based on shape sensitivity analysis, Journal of Computational Physics, 229 (2010), pp. 1327–1352. [45] A. Hay, J. T. Borggaard, and D. Pelletier, Local improvements to reduced-order mod- els using sensitivity analysis of the proper orthogonal decomposition, Journal of Fluid Mechanics, 629 (2009), pp. 41–72. [46] U. Helmke and J. B. Moore, Optimization and Dynamical Systems, Communications & Control Engineering, Springer–Verlag, London, 1994. [47] N. J. Higham, Functions of Matrices: Theory and Computation, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2008. [48] M. Hinze and S. Volkwein, Proper orthogonal decomposition surrogate models for nonlinear dynamical systems: Error estimates and suboptimal control, in Dimension Reduction of Large-Scale Systems, vol. 45 of Lecture Notes in Computational Science and Engineering,

34 Springer, Berlin–Heidelberg, 2005, pp. 261–306. [49] K. Huper,¨ M. Kleinsteuber, and F. Silva Leite, Rolling Stiefel manifolds, International Journal of Systems Science, 39 (2008), pp. 881–887. [50] K. Huper¨ and F. Ullrich, Real Stiefel manifolds: An extrinsic point of view, in 2018 13th APCA International Conference on Automatic Control and Soft Computing (CON- TROLO), June 2018, pp. 13–18. [51] K. Ito and S. S. Ravindran, A reduced-order method for simulation and control of fluid flows, Journal of Computational Physics, 143 (1998), pp. 403–425. [52] J. Jakubiak, F. S. Leite, and R. Rodrigues, A two-step algorithm of smooth spline gen- eration on riemannian manifolds, Journal of Computational and Applied Mathematics, 194 (2006), pp. 177–191. [53] S. Jayasumana, R. Hartley, and M. Salzmann, Kernels on Riemannian manifolds, in Riemannian computing in computer vision, A. Srivastava and P. K. Turaga, eds., Springer International Publishing, 2015, pp. 45–67. [54] H. K. R. Kim, I. L. Dryden, Smoothing splines on riemannian manifolds, with applications to 3D shape space. arXiv:1801.04978v2, 2018. [55] H. Karcher, Riemannian center of mass and mollifier smoothing, Communications on Pure and Applied Mathematics, 30 (1977), pp. 509–541. [56] H. J. Kim, N. Adluru, B. B. Bendlin, S. C. Johnson, B. C. Vemuri, and V. Singh, Canonical correlation analysis on SPD(n) manifolds, in Riemannian computing in com- puter vision, A. Srivastava and P. K. Turaga, eds., Springer International Publishing, 2015, pp. 69–100. [57] S. Kobayashi and K. Nomizu, Foundations of Differential Geometry, vol. I of Interscience Tracts in Pure and Applied Mathematics no. 15, John Wiley & Sons, New York – London – Sidney, 1963. [58] , Foundations of Differential Geometry, vol. II of Interscience Tracts in Pure and Ap- plied Mathematics no. 15, John Wiley & Sons, New York – London – Sidney, 1969. [59] K. A. Krakowski, L. Machado, F. Silva Leite, and J. Batista, Solving interpolation problems on Stiefel manifolds using quasi-geodesics, in Pr´e-Publica¸ci˜oesdo Departamento de Matem´atica,no. 15–36, Universidade de Coimbra, 2015. [60] W. Kuhnel¨ , Differential Geometry: Curves - Surfaces - Manifolds, Student mathematical library, American Mathematical Society, 2006. [61] S. Lang, Fundamentals of Differential Geometry, Graduate Texts in Mathematics, Springer New York, 2001. [62] J. M. Lee, Riemannian Manifolds: an Introduction to Curvature, Springer Verlag, New York – Berlin – Heidelberg, 1997. [63] , Introduction to Smooth Manifolds, Graduate Texts in Mathematics, Springer New York, 2012. [64] E. Massart and P.-A. Absil, Quotient geometry with simple geodesics for the manifold of fixed-rank positive-semidefinite matrices, Tech. Rep. UCL-INMA-2018.06, University of Louvain, 2018. [65] E. Massart, P.-Y. Gousenbourger, T. S. Nguyen, T. Stykel, and P.-A. Absil, Interpo- lation on the manifold of fixed-rank positive-semidefinite matrices for parametric model order reduction: preliminary results, Tech. Rep. UCL-INMA-2018.13, University of Lou- vain, 2018. [66] H. Q. Minh and V. Murino, Algorithmic Advances in Riemannian Geometry and Applica- tions: For Machine Learning, Computer Vision, Statistics, and Optimization, Advances in Computer Vision and Pattern Recognition, Springer International Publishing, Cham, 2016. [67] , From covariance matrices to covariance operators: Data representation from finite to infinite-dimensional settings, in Algorithmic Advances in Riemannian Geometry and Applications: For Machine Learning, Computer Vision, Statistics, and Optimization, H. Q. Minh and V. Murino, eds., Springer International Publishing, Cham, 2016, pp. 115– 143. [68] M. Moakher, A differential geometric approach to the geometric mean of symmetric positive- definite matrices, SIAM Journal on Matrix Analysis and Applications, 26 (2005), pp. 735– 747. [69] M. Moakher and P. G. Batchelor, Symmetric positive-definite matrices: From geometry to applications and visualization, in Visualization and Processing of Tensor Fields, J. We- ickert and H. Hagen, eds., Mathematics and Visualization, Springer, Berlin – Heidelberg, 2006, pp. 285–298. [70] M. Moakher and M. Zera´ ¨ı, The Riemannian geometry of the space of positive-definite

35 matrices and its applications to the regularization of positive-definite matrix-valued data., Journal of Mathematical Imaging and Vision, 40 (2011), pp. 171–187. [71] M. Morzynski,´ W. Stankiewicz, B. R. Noack, R. King, F. Thiele, and G. Tadmor, Continuous mode interpolation for control-oriented models of fluid flow, in Active Flow Control, R. King, ed., Springer, Berlin – Heidelberg, 2007, pp. 260–278. [72] E. Nava-Yazdani and K. Polthier, De Casteljau’s algorithm on manifolds, Computer Aided Geometric Design, 30 (2013), pp. 722–732. [73] T. S. Nguyen, A real time procedure for affinely dependent parametric model order reduc- tion using interpolation on Grassmann manifolds, International Journal for Numerical Methods in Engineering, 93 (2013), pp. 818–833. [74] L. Noakes, G. Heinzinger, and B. Paden, Cubic splines on curved spaces, IMA Journal of Mathematical Control and Information, 6 (1989), pp. 465–473. [75] M. Ohlberger and F. Schindler, Error control for the localized reduced basis multi-scale method with adaptive on-line enrichment, SIAM Journal on Scientific Computing, 37 (2015), pp. A2865–A2895. [76] H. Panzer, J. Mohring, R. Eid, and B. Lohmann, Parametric model order reduction by matrix interpolation, Automatisierungstechnik, 58 (2010), pp. 475–484. [77] B. Peherstorfer, D. Butnaru, K. Willcox, and H.-J. Bungartz, Localized discrete em- pirical interpolation method, SIAM Journal on Scientific Computing, 36 (2014), pp. A168– A192. [78] X. Pennec, Intrinsic statistics on Riemannian manifolds: Basic tools for geometric mea- surements, Journal of Mathematical Imaging and Vision, 25 (2006), p. 127. [79] X. Pennec, P. Fillard, and N. Ayache, A Riemannian framework for tensor computing, International Journal of Computer Vision, 66 (2006), pp. 41–66. [80] T. Popiel and L. Noakes, B´eziercurves and C2 interpolation in Riemannian manifolds, Journal of Approximation Theory, 148 (2007), pp. 111–127. [81] I. U. Rahman, I. Drori, V. C. Stodden, D. L. Donoho, and P. Schroder¨ , Multiscale rep- resentations for manifold-valued data, SIAM Journal on Multiscale Modeling and Simu- lation, 4 (2005), pp. 1201–1232. [82] Q. Rentmeesters, Algorithms for data fitting on some common homogeneous spaces, PhD thesis, Universit´eCatholique de Louvain, Louvain, Belgium, 2013. [83] Q. Rentmeesters and P.-A. Absil, Algorithms comparison for Karcher mean computation of rotation matrices and diffusion , in Proceedinngs of the 19th European Signal Processing Conference (EUSIPCO 2011), Barcelona, Spain, Aug. 29 - Sept. 2 2011. [84] M. Rewienski and J. White, Model order reduction for nonlinear dynamical systems based on trajectory piecewise-linear approximations, Linear Algebra and its Applications, 415 (2006), pp. 426 – 454. Special Issue on Order Reduction of Large-Scale Systems. [85] W. Rossmann, Lie Groups: An Introduction Through Linear Groups, Oxford Graduate Texts in Mathematics, Oxford University Press, 2006. [86] N. T. S., P.-Y. Gousenbourger, E. Massart, and P.-A. Absil, Online balanced trunca- tion for linear time-varying systems using continuously differentiable interpolation on grassmann manifold, Tech. Rep. UCL-INMA-2019.01, University of Louvain, 2019. [87] C. Samir, P.-A. Absil, A. Srivastava, and E. Klassen, A gradient-descent method for curve fitting on Riemannian manifolds, Foundations of Computational Mathematics, 12 (2012), pp. 49–73. [88] C. Samir and I. Adouani, C1 interpolating B´ezier path on Riemannian manifolds, with ap- plications to 3D shape space, Applied Mathematics and Computation, 348 (2019), pp. 371 – 384. [89] O. Sander, Interpolation und Simulation mit nichtlinearen Daten, GAMM Rundbriefe, 1 (2015), pp. 6–12. [90] , Geodesic finite elements of higher order, IMA Journal of Numerical Analysis, 36 (2016), pp. 238–266. [91] S. Sargsyan, S. L. Brunton, and J. N. Kutz, Online interpolation point refinement for reduced-order models using a genetic algorithm, SIAM J. Scientific Computing, 40 (2018), pp. B283–B304. [92] A. Srivastava and P. K. Turaga, Riemannian computing in computer vision, Springer International Publishing, 2015. [93] F. Steinke, M. Hein, J. Peters, and B. Schoelkopf, Manifold-valued Thin-Plate Splines with Applications in Computer Graphics, Computer Graphics Forum, (2008). [94] G. Tadmor, O. Lehmann, B. R. Noack, and M. Morzynski´ , Galerkin models enhance- ments for flow control, in Reduced-Order Modelling for Flow Control, B. R. Noack, M. Morzy´nski,and G. Tadmor, eds., Springer, Vienna, 2011, pp. 151–252.

36 [95] P. K. Turaga, V. A., and R. Chellappa, Statistical analysis on Stiefel and Grassmann manifolds with applications in computer vision, in 2008 IEEE Conference on Computer Vision and Pattern Recognition, June 2008, pp. 1–8. [96] B. Vandereycken, P.-A. Absil, and S. Vandewalle, A Riemannian geometry with com- plete geodesics for the set of positive semidefinite matrices of fixed rank, IMA Journal of Numerical Analysis, 33 (2012), pp. 481–514. [97] G. Weickum, M. S. Eldred, and K. Maute, Multi-point extended reduced order modeling for design optimization and uncertainty analysis, in Proceedings of the 2nd AIAA Mul- tidisciplinary Design Optimization Specialist Conference, no. AIAA 2006-2145, Newport, RI, May 1–4 2006. [98] Y.-C. Wong, Differential geometry of Grassmann manifolds, Proceedings of the National Academy of Sciences of the United States of America, 57 (1967), pp. 589–594. [99] R. Zimmermann, Gradient-enhanced surrogate modeling based on proper orthogonal decom- position, Journal of Computational and Applied Mathematics, 237 (2013), pp. 403 – 418. [100] , A locally parametrized reduced order model for the linear frequency domain approach to time-accurate computational fluid dynamics, SIAM Journal on Scientific Computing, 36 (2014), pp. B508–B537. [101] , Local parametrization of subspaces on matrix manifolds via derivative information, in Numerical Mathematics and Advanced Applications ENUMATH 2015, B. Karas¨ozen, M. Manguo˘glu,M. Tezer-Sezgin, S. G¨oktepe, and O.¨ U˘gur,eds., Springer International Publishing, Cham, 2016, pp. 379–387. [102] , A matrix-algebraic algorithm for the Riemannian logarithm on the Stiefel manifold un- der the canonical metric, SIAM Journal on Matrix Analysis and Applications, 38 (2017), pp. 322–342. [103] , Hermite interpolation and data processing errors on Riemannian matrix manifolds. arXiv:1908.05875, 2019. [104] R. Zimmermann and K. Debrabant, Parametric model reduction via interpolating orthonor- mal bases, in Numerical Mathematics and Advanced Applications ENUMATH 2017, F. A. Radu, K. Kumar, I. Berre, D. N. Nordbotten, and I. S. Pop, eds., Springer International Publishing, Cham, 2018. [105] R. Zimmermann and K. Willcox, An accelerated greedy missing point estimation procedure, SIAM Journal on Scientific Computing, 38 (2016), pp. A2827–A2850.

37