<<

Chapter 8

Mathematical Techniques

This chapter contains a review of certain basic mathematical concepts and tech- niques that are central to metric prediction generation. A mastery of these fun- damentals will enable the reader not only to understand the principles underlying the prediction algorithms, but to be able to extend them to meet future require- ments.

8.1. Scalars, Vectors, Matrices, and

While the reader may have encountered the concepts of scalars, vectors, and ma- trices in their college courses and subsequent job experiences, the concept of a is likely an unfamiliar one. Each of these concepts is, in a sense, an exten- sion of the preceding one. Each carries a notion of value and operations by which they are transformed and combined. Each is used for representing increas- ingly more complex structures that seem less complex in the higher-order form.

Tensors were introduced in the Spacetime chapter to represent a complexity of perhaps an unfathomable incomprehensibility to the untrained mind. However, to one versed in the theory of , tensors are mathematically the simplest representation that has so far been found to describe its principles.

8.1.1. The Hierarchy of Algebras As a precursor to the coming material, it is perhaps useful to review some ele- mentary algebraic concepts. The material in this subsection describes the succes-

1 2 Explanatory Supplement to Metric Prediction Generation sion of algebraic structures from sets to fields that may be skipped if the reader is already aware of this hierarchy.

The basic theory of algebra begins with set theory . Simply put, a set is a collec- tion of objects and a Boolean function, denoted ß, that determines whether or not any given object is one of those in the collection. It need not be a physical col- lection of objects, but may be an abstraction, defined by description. A set de- scribes a mathematical relation , namely, that all members are related by set membership. A set possesses the attribute of size , or order , or cardinality , de- noted ô (the Hebrew aleph character), that designates the number of elements in the set, which may be either finite or infinite. A set that is contained within an- other set is said to be a subset . Set operations include the equality, subset and superset Booleans ;) • ) ¢ , along with combinatorial operations union and inter- section, °) † .

Along with logic and the predicate calculus, set theory is one of the axiomatic foundations of . The reader is presumed to have a basic understand- ing of sets, so that further description of set properties and theory is unnecessary here.

Functions defined on the objects of sets are commonly called maps , mappings , or morphisms . They are the means by which the objects in one set may be associ- ated with those of another. They are single valued , that is, each object in the do- main set corresponds to exactly one object in the range set. They may be one-to- one , where each object of the domain maps to a unique object of the range, or it may be many-to-one , where more than one domain object corresponds to a given object of the range. Associations that are one-to-many are not 1 allowed in this category.

The cardinality of the set of natural numbers (1, 2, 3, Ω) is called a “countable”

or “enumerable” or “denumerable” infinity, and denoted ô 0. Any set that has a 1-1 correspondence to the natural numbers is also said to be denumerable. Set theory teaches that the set of integers, as well as the set of rational numbers, are also of cardinality ô 0. However, the set of real numbers y belongs to a higher order, non-denumerable infinity of cardinality ô 1. Complex numbers { are also of this order of infinity 2.

1 The broader context is that of a relation , which does admit one-to-many correspondences. 2 There are an infinite number of higher-cardinality infinity classifications, but these are of no con- Mathematical Techniques 3

Operations, other than “membership,” may be defined on the elements of a set, such as “negation” (a unitary operation) or “addition” (a binary operation), or “modulus” (a tertiary operation), that map (i.e., associate) members of a set with other objects. When the mapping of an object always produces an object within the set, the set is said to be closed under that operation.

One of the simplest algebraic structures is the magma , which is defined as a set having a single, closed binary operation. If the operation, here denoted “ o”, is such that (a o b) o c = a o (b o c) for each triplet of elements taken from the set, it is said to be associative . An associative magma is called a sub- group . A subgroup is thus a set having a single closed associative binary opera- tor.

If a subgroup possesses an element, here denoted “ i”, such that a o i = i o a = a , then that object is called the operation’s identity , and this particular sub- group with identity is called a monoid . A monoid is therefore a set with a single closed associative binary operator and identity.

If a given monoid has the property that, for each element a there is an element b such that a o b = b o a = i , then b is said to be the inverse of a under o, and vice versa. A monoid with inverse elements is called a group . A group is thus a set with a single associative closed binary operator with identity and in- verses.

If the operation is also commutative , i.e, a o b = b o a , then the group is said to be a commutative group , or Abelian group (named for the mathematician Niels Henrik Abel).

The set of integers (positive, negative, and zero) forms an Abelian group under addition. The identity element in this case is zero, and inverses are denoted in the familiar way that denotes subtraction. The set of rational numbers (ratios of integers, excluding zero divisors) is also an Abelian group under addition, with the additive identity again equal to zero.

However, the set of integers does not form a group under multiplication, as the inverse of an integer is a fraction, not an integer. The nonzero rational numbers do form an Abelian group under multiplication, with the multiplicative identity equal to one, and inverse elements defined by inverting the fraction. sequence here. 4 Explanatory Supplement to Metric Prediction Generation

If a set has two binary operators, here denoted “ +” and “ ‡”, and is an Abelian group under the former and a monoid under the latter, with the additional dis- tributive property that a‡(b + c) = a ‡b + a ‡c and (a + b)‡c = a ‡c + b‡c for all pairs of elements, then the set is said to be a ring . A ring thus has two binary operators, is a commutative group under one (referred to as the addi- tive operator) and a monoid under the other (referred to as the multiplicative op- erator), such that addition is distributive with respect to multiplication. Commu- tativity of products is not required; however, if it holds, the ring is said to be commutative .

Rings are not required to have multiplicative inverses. A ring with multiplicative inverses is called a division ring . A commutative division ring is called a field , and is the mathematical entity under which addition, subtraction, multiplication, and division are well defined. The field is a commutative group under both addi- tion and multiplicative operators in which addition is distributive with respect to multiplication.

Examples of non-finite fields include the rational numbers, the real numbers, and the complex numbers. Examples of finite fields include the integers modulo a prime number and extensions to these, known as Galois fields, after the French mathematician Évariste Galois. Galois, who did not use the term "field" in his studies, is honored as the first mathematician linking group theory and field theory.

Algebraic structures are summarized in the following table in increasing order of structure. The latter three entries are discussed in coming subsections of the chapter. Mathematical Techniques 5

Algebraic Entity Operators, Properties

set ß)))) ô ° † •¢ )

magma single binary operation

subgroup operation is associative

monoid identity element

group each element has inverse

Abelian group operation is commutative

ring two operations, + and ÷

Abelian +

distributivity of ÷ over +

division ring group under ÷

field ÷ is commutative

scalar field element

set of scalars

Euclidean neighborhoods

8.1.2. Scalars The concept of a scalar is familiar to almost all who have completed high school algebra. It is merely a single number, or symbol that represents a number, used to describe some attribute of a physical or abstract entity. The number may be real, imaginary, or complex. It may also be a member of any other field, if appli- cable to the problem at hand. 6 Explanatory Supplement to Metric Prediction Generation

8.1.3. Vectors and Vector Spaces For the purposes of this work, a vector is defined simply as an ordered list (or tuple ) of scalars, all from the same field 0 . Each of the scalars is called a coor- dinate . Vectors thus may be used represent to represent multidimensional quanti- ties having a common scale of measure.

A vector space @; 0 k over a field 0 is defined abstractly as a set of n-tuples over 0 , called vectors, and two binary operations, here called addition and scalar multiplication, such that (1) the set is an Abelian group under the addition opera- tor, (2) distributivity holds for scalar multiplication over sums of vectors and for the sum of scalars multiplying a vector, and (3) scalar multiplication has an iden- tity element. For all MPG applications, vector spaces will have finite dimen- sions, and 0 is usually the field of real numbers y .

Vector symbols are denoted here in boldface characters. The vector expressed as a coordinate list is written commonly in vertical form

æ^. Õ ø Œ ø^/ Œ ^ ; ø Œ (8-1) ø@ Œ ø Œ ø^ Œ ¿øk œŒ where the components ^f ß 0 . This form is also called a column vector . The Q horizontal form ^ ; %)^. ^ / )? ) ^ k & is a row vector called the of ^ . As discussed later in this chapter, row vectors form a dual vector space to column vectors. These conventions conform in notation conform to those commonly used in textbooks on matrices and linear algebra.

Mathematicians use an index convention distinguishing denoting row and col- umn vector components, viz., indexes are in superscript form ^f for column vec- tor components and subscript indices ^f for row vector components. This con- vention will be elaborated upon later in the chapter. Until then, subscript nota- tion will be used in both cases to avoid confusion.

The size of the coordinate list is called its . The entire set of vectors over the given field is called the coordinate space over 0 whose dimension is the number of coordinates. However, the dimension of the coordinate space may be different than the dimension of the vector space being represented in a given Mathematical Techniques 7 application. For example, vectors confined to a plane in 3-dimensional space are members of a 2-dimensional vector space. Dimensionality of the application vector space will be treated a little later in this section.

8.1.3.1. Vector Addition Combination of vectors under the additive operator of 0 is defined by

^ æÕæ_ ^) _ Õ æ. Õ øŒø. . . Œ ø Œ øŒø Œ ø^/ Œ _/ ^ /) _ / ø Œ øŒø Œ );øŒø Œ ;)^f _ f ø@ Œ @ @ ø Œ øŒø Œ ø Œ øŒø Œ ^k øŒø Œ ¿ø œŒ øŒø_k ^ k) _ k Œ ¿œ¿ œ (8-2) ^ æÕæ_ ^+ _ Õ æ. Õ øŒø. . . Œ ø Œ øŒø Œ ø^/ Œ _/ ^ /+ _ / ø Œ øŒø Œ +;øŒø Œ ;+^f _ f ø@ Œ @ @ ø Œ øŒø Œ ø Œ øŒø Œ ^k øŒø_ ^+ _ Œ ¿ø œŒ ¿œ¿øŒøk k k œŒ By common convention, subtraction is defined as addition using additive inverse elements of field values.

8.1.3.2. Scalar Multiplication Given a vector ^ over 0 and a scalar h also over 0 , the scalar product is de- fined by

æ^. Õ æh ^ . Õ ø Œ ø Œ ^ øh ^ Œ ø/ Œ ø/ Œ hh^ ;ø Œ ; ; h^ (8-3) ø@ Œ ø@ Œ f ø Œ ø Œ ø Œ ø Œ ^k øh ^ Œ ¿ø œŒ ¿øk œŒ

8.1.3.3. Vector Space Dimension

Each vector space possesses non-unique sets of vectors b.) b / )? ) b k by which all other points may be generated using linear means. That is, every vector s may be written in the form 8 Explanatory Supplement to Metric Prediction Generation

sbb;_. . ) _. / ))? _ k b k (8-4) with _f ß 0 . The least k for which this is possible is the vector space dimen- sion , and the basis vectors are said to be linearly independent and span the vector space. Without loss of generality, all of the basis vectors may be assumed to have unit length. Further, there exists a method, called the Gramm-Schmidt pro- cedure, by which any given set of basis vectors may be transformed into a set of orthonormal , or orthogonal, unit-length vectors using linear algebra, discussed later. Further details on this procedure are readily found in textbooks or by using internet search.

It is perhaps tempting to use the term vector for vector-like objects whose basis elements cannot be written as linear combinations of an orthonormal set. For example, the reader will appreciate that a point in space may be represented in a number of ways, each described by a particular , or set of basis elements. The coordinates of these “vectors” combine differently, however, ac- cording to the coordinate representation. For example, the rule for adding two “vectors” expressed in spherical coordinates is not the same as addition in a vec- tor space. However, ultimately, the two results represent the same point in space.

The treatment here, then, will assume that the coordinate system being used obeys the laws of vector arithmetic. Translations to and from other systems can be made according to the transformation formulas between the two systems.

Many of the vector arithmetic and calculus are stated in orthonormal basis, or Cartesian form. The advantage of this form is simplicity in defining and using the tools of linear algebra, to be introduced in the chapter. The rules of Cartesian vector arithmetic are summarized in subsections below.

In three , the unit vectors defining the Cartesian coordinate axes are commonly denoted

fQ ; %.)-) -& gQ ; %-).)-& (8-5) hQ ; %-) -).&

A 3-vector, then, can represent a directed line segment, characterized by its magnitude, or length, and its direction. These vectors are commonly used to represent physical entities such as forces and distances. Mathematical Techniques 9

8.1.3.4. Inner Product, or Q The general form of the dot product of two vectors ^ ; X))^. ^ / ? ) ^ k Z and Q _ ; X))_. _ / ? ) _ k Z relative to the set of linear basis elements %)b. b / )? ) b k & is k k ^_Y; ∫ ∫ ^fg _ % bb f Y g & (8-6) f;. g ; . which depends on the dot products among basis elements. If bf; Q r f is the linear transformation that maps orthonormal basis vectors to the given basis set, Q Q then bfgY b; rQQr f% & g . The form of this transformation and the rules im- plicit in the combination of terms are discussed a little later on.

For orthonormal basis elements, the inner product is defined by

^_Y ; ^_Q ;: ^_) < ^ æÕ_ æÕ _ æ. Õ øŒ. øŒ . ø Œ øŒ øŒ ø^/ Œ _/ _ / ø Œ øŒ øŒ ;Y?øŒ ; X))^. ^ / ) ^ k Z øŒ ø@ Œ @ @ (8-7) ø Œ øŒ øŒ ø Œ øŒ øŒ ^k øŒ_ øŒ _ ¿ø œŒ ¿œøŒk ¿œøŒ k ;^_.. ) ^_ // ))? ^_k k

; ^f _ f The first two rows of this set of relationships displays different notations for the inner product; the penultimate row shows how it is calculated using the opera- tions of 0 , and the last row is Einstein’s notation, wherein elements having the same indexes are added.

In particular, the magnitude of a vector is found using the orthonormal basis dot product

// / / ^;;y^ y ^Y ^ ; ^^^^. ))); / ? k f (8-8)

The magnitude, being a scalar, is denoted using non-boldface type. Vectors hav- ing unit magnitude are called unit vectors or pointing vectors . Any vector of nonzero magnitude may be transformed into a unit vector having the same direc- 10 Explanatory Supplement to Metric Prediction Generation tion using scalar multiplication by the inverse magnitude. For example, if s is a vector whose magnitude s is nonzero, then

. sÉ ; s (8-9) s is a unit vector. The circumflex (^) over the vector symbol is commonly used to denote the unitized version of the vector.

In Euclidean space (defined more formally later), there is a strong relationship between the dot product, vector lengths, and the included angle. In two or three dimensions,

^Y _ ;^ _ `lp o (8-10)

For higher order dimensions, this relationship defines the notion of an angle be- tween vectors. This relationship also defines the concept of orthogonality of vec- tors. Any two vectors having zero dot product are oriented at right angles to each other, and are perpendicular.

8.1.3.5. Outer Product, or Vector Product, or Cross Product The vector outer product is defined only on 3-vectors. In Cartesian coordinates the product is

f g h

^è _ ; ^. ^ / ^ 0 (8-11) _. _ / _ 0

;%^_/0 + ^_ 0/ &%f )+ ^_ 0. ^_ .0 &% g )+ ^_ ./ ^_ /. & h

The generalized form of the outer product is a form of tensor and not elaborated upon here. The interested reader will find ample information on this subject us- ing an internet search.

The cross product above introduces a symbolic form that is expanded using the ordinary method of . The vectors in the top row are the coordinate unit vectors introduced earlier.

Specifically, it is to be noted that Mathematical Techniques 11

ffè;è gg ; hh è ; - (8-12) fghè;) ghfhf è; ) è; g

By its definition, the cross product of two 3-vectors is thus another 3-vector. However, this operation is not commutative, for Eq.(8-11) implies that

^_è ;+ _^ è (8-13) Even so, the cross product is distributive over vector addition,

^è% _` ) & ;è)è ^_^` (8-14) %^) _ & è; ` ^`_` è) è

The vector product also has a geometric interpretation relating vector magnitudes and included angle,

^è _ ; r %^ _ pfko & (8-15)

The quantity in parentheses is a scalar, which multiplies a unit vector, here de- noted r , that is perpendicular to both ^ and _ and is oriented in the direction that a right-handed screw advances when ^ is rotated to align with _ .

Because of this perpendicularity, there is no 3-vector, call it . with the property that ^è . ; ^ . Therefore, vector algebra fails to be a monoid with respect to the vector product, and consequently lacks the niceties enjoyed by rings and fields.

8.1.3.6. Triple Scalar Product The scalar found by combining inner and outer products of three vectors, ^Y% _è ` & . This product takes the value

^Y% _è ` & ; ^ _ ` pfko `lp d (8-16) ; slirjb Here, o is the angle between _ and ` , and d is that between the _* ` plane and ^ . The volume is that of the parallelepiped formed by the coterminous sides ^) _ ) and ` . Application of the rules above result in the 12 Explanatory Supplement to Metric Prediction Generation

^. ^ / ^ 0

^_`Y%è;è &% ^_` && Y ; _. _ / _ 0 (8-17)

`. ` / ` 0

Because of this symmetry, the product is sometimes written as %^_` & since there can be no confusion as to where the dot and cross belong. Also,

%^_` &%; `^_ &% ; _`^ & (8-18) ;+%&_^` ;+ %&%& `_^ ; ^`_

8.1.3.7. Triple Vector Product The triple vector product ^è% _ è ` & plays an important role in the development of vector algebra and its applications. The result is a vector, since it is the prod- uct of vectors ^ and %_è ` & . It is perpendicular to the vector _è ` , and there- fore lies in the plane of _ and ` . Similarly, %^è _ & è ` is perpendicular to ^è _ and therefore lies in the plane of ^ and _ . The product values are

^èè;% _` &%& ^`_Y + %& ^_` Y (8-19) %^èè; _ & ` %&%& ^`_Y + _`^ Y

The reader will note that the triple cross product is thus not associative, so the set of vectors over 0 is not even a subgroup under è; it is a mere magma.

8.1.4. Matrices and Linear Algebra Vector spaces are the basic object of study in the branch of mathematics called linear algebra, and thus vector spaces are often called linear spaces . Linear alge- bra is that branch of mathematics concerned with the study of vectors and vector spaces, linear transformations (or linear maps) among vector spaces, and systems of linear equations. It is used widely throughout mathematics and also has exten- sive applications in natural and social sciences.

The history of modern linear algebra traces back to the 1840’s, and has been richly studied since. Its principle period of development was the twentieth cen- tury. Those interested in the development timeline will find fascinating accounts of its progress using internet search. Mathematical Techniques 13

In mathematics, a is a rectangular table of values from a ring. Matrices are used to describe and solve linear equations, to keep track of coefficients in linear transformations, and to record data that depend on multiple parameters. They can be added, multiplied, and decomposed in various ways, making them a key concept in linear algebra. In metric prediction generation, the entries of a matrix are usually real numbers.

The matrix form > ;X^f) g Z% f ; .)))? kg ; .))& ? j symbolically signifies that the matrix is a set of k j elements from a designated set, arranged in a tabular form with k rows and j columns, in which the value ^f) g appears in the ith row and jth column. The table is referred to as an kè j (pronounced “ k by j ”) matrix; k is called the row dimension , and j is the column dimension . A matrix with equal row and column dimensions is square .

Matrices are normally denoted by boldface capitals, whereas vectors are denoted using boldface lower case.

A vector is often represented as an k è . matrix, and its transpose, by the corresponding .è k matrix. Individual columns of a matrix are thus called column vectors , and individual rows are called row vectors .

For a matrix > ; X^fg Z , the matrix formed by interchanging rows and columns is called its transpose , and noted with a T superscript. The element indexed as ^fg in row-column format becomes the element ^gf in this format.

Q > ; X^gf Z (8-20)

Thus, the transpose of an kè j matrix becomes an jè k matrix. The transpose of a product of matrices is equal to the product of the transposed matrices in reverse order:

Q QQ QQ %>>./? >>h+ . h & ; >> hh + ./. ? >> (8-21)

A matrix that is equal to its transpose is said to be symmetric . A matrix that equals the negative of its transpose is skew symmetric . 14 Explanatory Supplement to Metric Prediction Generation

8.1.4.1. Operations and Conformability A matrix is said to be conformable if its dimensions are appropriate for a given operation. Thus, a given matrix may be conformable in some usages, and non- conformable in others.

For conformability of addition, the two matrices must have the same dimensions, for the sum of two matrices is defined by termwise addition,

XZXZX^fg) _ fg ; ^_ fg ) fg Z (8-22) For conformability of multiplication, the column dimension j of the first must be the same as the row dimension of the second, for the matrix product is defined by row-column inner products,

j XZXZX^_fg gh; ∫ ^_ fggh Z (8-23) g ;.

Thus, the result of multiplying an kè j matrix by an jè m matrix is an kè m one.

It is relatively easy to see that matrix addition is commutative, while matrix mul- tiplication is not.

8.1.4.2. Identity Matrix and Inverses The identity matrix of dimension k is defined as the kè k matrix

æ. -? - Õ ø Œ ø- .? - Œ ø Œ Fk ; ø Œ (8-24) ø@@B - Œ ø Œ ø- -? . Œ ¿ø œŒ The identity matrix is an example of a diagonal matrix, which is one all of whose off-diagonal elements are zero. Mathematical Techniques 15

æ^. -? - Õ ø Œ ø-^ ? - Œ ø/ Œ af^d%^. ) ^ / )? ) ^ k & ; ø Œ (8-25) ø@ @ B @ Œ ø Œ ø- - ? ^ Œ ¿øk œŒ Diagonal matrices are of particular importance because the linear transformation they correspond to is very simple.

If matrix > is of dimension kè j , if ? is of dimension jè k , and their product is the identity,

> ?; F (8-26) then > is said to be the left inverse of ? , and ? is the right inverse of > . In- verses can only exist if kÄ j , and only under certain conditions, addressed in the next subsection.

If k; j and the inverses do exists, then it so happens that the two commute,

>?; ?> ; F (8-27) so > is the inverse of ? , and vice-versa. The inverse of > is denoted >+. .

If no ? exists that produces Eq.(8-27), then > is said to be singular . If one does exist, it is unique and > is nonsingular .

Methods for finding the inverse of a given matrix may be found in texts and other literature. Moreover, they are codified in computerized routines that hide this complexity from users. The SPICE function INVERT computes the inverse of a 3µ3 matrix. A general method for matrix inversion appears in [Press86].

8.1.4.3. Row Rank and Column Rank For any given kè j matrix, the maximum number of linearly independent row vectors taken from the matrix is called the row rank of the matrix. Similarly, the maximum number of linearly independent column vectors is called the column rank. Naturally, the row rank cannot exceed k and the column rank cannot ex- ceed j , but it is a result of matrix theory that the row rank and column rank are exactly the same! Thus, the rank of a matrix is this common value. 16 Explanatory Supplement to Metric Prediction Generation

8.1.4.4. Determinants and Singularity A matrix is nonsingular if, and only if, it is square and its rank equals its dimen- sion. One of the most important indicators of singularity is the determinant of the matrix, sometimes denoted abq%> & or y> y . In fact, a square matrix is sin- gular if and only if its determinant is zero.

Determinants may be found recursively using Laplace’s formula, as follows: Let

Jf) g denote the matrix (minor ) obtained from > by deleting the i-th row and j- f) g th column, and let `fg);% + .& yJ fg ) y be the signed determinant of this mi- nor, or cofactor . Then the determinant of y> y is a scalar equal to the sum of the products of the elements in any row and the cofactors of the elements in that row, or

k y> y ; ∫^fg) ` fg ) (8-28) g ;. The choice of the particular row, i in this case, is immaterial.

This formula is efficient for matrices of low order, but becomes extremely ineffi- cient as a general purpose method, as it involves summing k  products of k values for a kè k matrix. There are, fortunately, general methods of order L% k 0 & that may be found in texts on matrix theory or linear algebra. The SPICE DET routine uses the Laplace formula to compute the determinant of 0è 0 ma- trices. A L% k 0 & computation method is found in [Press1986].

Some of the salient properties of the determinant are

abq%> &; abq% > Q & abq%_> &; _ k abq% > & abq%>? &; abq% > &abq% ? & (8-29) ª> ? º À abqº À ; abq%> &abq%& ? Ωº - ? À Ã

abq%af^d%^^./ ) )? ) ^k && ; ^^ ./ ? ^ k

8.1.4.5. Solution of Matrix Equations and Linear Systems A linear system of equations over a field 0 is a set of k linear equations in h variables (sometimes called unknowns ) of the form Mathematical Techniques 17

^u^u...) .// ))? ^u .h h ; _ .

^u^u/..) /// ))? ^u /h h ; _ / (8-30) @; @

^u^uk..) k // ))? ^u khhk ; _ in which the uf are unknowns and ^fg and _f are known values in 0 . This sys- tem may be rendered in matrix form

> u; _ (8-31) where > is the matrix of coefficients in 0 , u is the column vector of variables, and _ is the column vector of results in 0 .

If h: k , then the system, in general, is over determined, and there will be no solution. If h< k , then the system is underdetermined, and there will be an in- finity of solutions. If, however, h; k and the matrix is nonsingular, then the system has a unique solution in the k variables, which may be found using the method known as Cramer’s rule, which is

^..? ^ .%.&fff+ _^ .%.& ) ? ^ . k @? @ @ @ ?@

^k.? ^ kf %+ .& _^ kkf % ) .& ? ^ kk u ; (8-32) f y> y in which the numerator determinant is formed by replacing the i-th column vector of > by _ .

Alternately, the solution may be found using the matrix inverse,

u; >+. _ (8-33)

The matrix inverse method is usually the most efficient method of solution, since inverses are readily found using computerized routines available in numeric solu- tion packages. 18 Explanatory Supplement to Metric Prediction Generation

8.1.4.6. Eigenvalues and Eigenvectors The eigenvalues of a matrix are a special set of scalar properties of a matrix that are extremely important in engineering and . They are also have been referred to by various authors as characteristic roots , characteristic values , proper values , and latent roots . The term derives from the German “eigen” meaning “proper.”

They are defined as follows. Let > be a square matrix of dimension n. Then if there exists an n-vector u and scalar j such that

> u; j u (8-34) then j is said to be an eigenvalue of > whose corresponding (right) eigenvector is u . An eigenvector is thus invariant in the sense that its direction remains un- changed under the linear transformation > .

The set of eigenvalues must satisfy the matrix equation

%>+j Fu & ; - (8-35)

As a result of Cramer’s rule, a system of linear equations can have a nontrivial solution only if its determinant vanishes. Eigenvalues must therefore be the roots of the polynomial equation of degree n given by

^..+ j ^ ./? ^ . k

^/. ^ //+ j ? ^ / k m%&j ; ; - (8-36) @ @B@

^k. ^ k / ? ^ kk + j

This equation is called the characteristic equation , and the left-hand side is called the characteristic polynomial of the matrix > . For example, the character- istic equation of a /è / matrix is

/ m%&j;+) j % ^^.. // &% j ) ^^^^ .. // + ./ /. &- ; (8-37) and its roots are

. j ;%^^ )ç &1 ^^ )) % ^^ & / (8-38) ç / & .. // .//. .. // ' Mathematical Techniques 19

Eigenvalue solution is thus just polynomial root finding, and there may be no solutions in the field 0 itself. If the matrix elements are real, for example, the eigenvalues may be complex numbers that occur in conjugate pairs. It is known, however, that the eigenvalues of symmetric matrices are real. Moreover, any matrix of odd dimension over y must have one real eigenvalue.

While it is true that every square matrix of degree n has n eigenvalues, some of them may be zero (if the matrix rank is less than n), and others may not be unique. Eigenvectors corresponding to unique eigenvalues are linearly inde- pendent. The others are linearly dependent on the linearly independent ones, and are usually not returned by matrix subroutine libraries. Eigenvectors correspond- ing to complex eigenvalues are necessarily vectors of complex numbers.

Oddly enough, the matrix itself satisfies the characteristic polynomial. This is the Cayley-Hamilton theorem,

m%> & ; - (8-39) where in this case, the right-hand side of the equation is the square matrix of di- mension n containing all zeros.

8.1.4.7. Similarity If > and ? are square matrices of the same dimension n and the latter is non- singular, then the matrix >~ ; ?+. >? is said to be similar to > . Similar ma- trices share many invariants. For example, they have the same rank, determinant, eigenvalues, and trace.

A matrix > is diagonalizable if it is similar to a diagonal matrix. The diagonal elements in this case are the eigenvalues of > . The method for finding the di- agonal form thus involves finding eigenvalues and eigenvectors. A matrix may thus not be diagonalizable if it does not have all eigenvalues unique. The invert- ible matrix ? in this case may contain complex numbers when eigenvalues are complex.

However, every matrix is similar to one that is almost diagonal, called Jordan normal form, which is 20 Explanatory Supplement to Metric Prediction Generation

+. ?>?; af^d% GG. ) / )? ) G h & æj . -? - Õ øf Œ ø Œ ø-jf .? - Œ ø Œ (8-40) G ; ø- -jf ? - Œ f ø Œ ø Œ ø@ @ @ B . Œ ø Œ ø- - - j Œ ¿øf œŒ The number k of Jordan blocks along the diagonal equals the number of unique eigenvalues, and the size of each block is the dimension of the subspace having this eigenvalue. The sizes of the blocks sum up to the matrix dimension. The ? that produces this transformation may contain complex elements if eigenvalues are complex numbers.

Matrices in Jordan form are almost as simple as diagonal ones, often reducing the complexity of a problem by orders of magnitude in computation.

8.1.4.8. Trace of a Matrix The trace of a square matrix > , denoted by qo%> & , is defined as the sum of its diagonal elements,

k qo%> & ; ∫^ff (8-41) f;. The trace of a matrix has a number of useful properties, among which are

qo%> &; qo% > Q & qo%>?) & ; qo%& > ) qo%& ? qo%_> &; _ qo%& > qo%&>; qo% ?>?+. & %abq%& ? î -& (8-42) qo%>? &; qo% ?> & k qo%> & ; ∫ jf f;. k h h qo%> & ; ∫ jf f;. Mathematical Techniques 21 when > and ? are square matrices of the same dimension. The penultimate of these relationships points out that the trace is the sum of the eigenvalues of the matrix.

8.1.4.9. Rotations In matrix theory, an orthonormal matrix is a real square matrix of given dimen- sion whose transpose is its inverse,

>>Q; >> Q ; F (8-43)

The determinant of a orthonormal matrix, by Eq.(8-29), is thus ç. . As a result of this definition, the row vectors are mutually orthogonal with unit length, and the same is true of column vectors.

The product of two orthonormal matrices is also orthonormal, as may be seen by

%&%&%>?Q >?; ?> QQ &%& >? ; ?>>? QQ % & ; F (8-44) A rotation matrix is an orthonormal matrix whose determinant is unity. It corre- sponds to a geometric rotation about a fixed axis in an n-dimensional Euclidean space. In the context of 3-dimensional space, rotations are almost always used as Cartesian coordinate transformations, which require conventions for angular sense (viz., counterclockwise positive) and handedness (viz., right-handed coor- dinate system).

Further, care is required in interpreting the sign of the rotation angle, because a rotation of a vector by an angle o can be viewed as rotating the coordinate sys- tem in which the vector is embedded by the angle +o . Three dimensional rota- tion matrices here are thus termed coordinate rotations or vector rotations , de- pending on whether they are used to rotate a coordinate system or to rotate a vec- tor within a give coordinate system.

A coordinate rotation matrix O has least one of its eigenvalues equal to unity, and the product of all eigenvalues is unity. If all are real, then they are (1, 1, 1) or (1, -1, -1), in some order. If two of the eigenvalues are complex, then the product of the complex eigenvalues is unity. 22 Explanatory Supplement to Metric Prediction Generation

A rotation matrix keeps at least one vector fixed, O u; u , where u is the ei- genvector corresponding to the unit eigenvalue. If there is only one real eigen- value, this vector is unique and called the axis of rotation.

Since the determinant of any 3 by 3 matrix columns (see e.g., Eq.(8-17)) is the dot product of the third column (or row) with the cross product of the first and second, and since this product for a rotation matrix is unity, the third column (or row) must be the cross product of first two. Therefore, the 3-space spanned by the column (or row) vectors is right handed.

Rotations preserve inner products (thus lengths),

%Or &%Y Os &; % Or &%Q Os & ; rOOs Q Q ; rs Y (8-45)

Coordinate rotations also preserve outer products,

Or%è s &%&%& ; Or è Os (8-46)

A rotation corresponds to an axis of rotation and an angle by which coordinates are rotated. Given a coordinate rotation matrix, the axis and angle may be found; vice versa, given the axis and angle, the matrix can be found, as will now be dis- cussed.

The matrix below corresponds to a coordinate rotation by an angle o (or vector rotation by an angle +o ) about the x-axis,

æ Õ ø. - - Œ ø Œ Yo [. ; ø- `lp o pfk o Œ (8-47) ø Œ ø-+ pfko `lp o Œ ¿ø œŒ More will be said about this notation in the next subsection. As shown in Eq. (8-42), the trace of a matrix is unchanged under a reorientation of the coordinate system (i.e., an orthonormal similarity transformation). Therefore, the matrix of any rotation by an angle o about an arbitrary axis will have its trace equal to

qo%O &; . ) /`lp o (8-48)

The angle of rotation is thus related to the matrix by Mathematical Techniques 23

ªqo%O & + . o ; `lp +. º À (8-49) Ωº / ÃÀ However, the sign of the angle is not recovered using this method. For this rea- son, the angle is usually presumed to be in the interval X-)n Z .

Q The unit-length rotation axis r ; %)r. r / ) r 0 & and rotation angle o by which an arbitrary vector o is to be rotated about r can be related to its corresponding vector rotation matrix O in the following way: The vector o can be decom- posed into a component in the direction of r and a component perpendicular to it. The length of the component along r is clearly oY r , so the perpendicular component is o+ % orrY & . But, as the reader will note from Eq.(8-19), the per- pendicular component is also given by

rèè;% or &%&%& rroY + ror Y ;+ o %& ror Y (8-50)

Next, the skew symmetric matrix R defined by

æ Õ ø- +r0 r / Œ ø Œ R ;ør0- + r . Œ (8-51) ø Œ ø+r r - Œ ¿ø/ . œŒ Q has the property that for any vector u ; %)u. u / ) u 0 & , the matrix product is the same as the vector cross product,

æ Õ øur0/+ ur /0 Œ ø Œ Ru;è; r u øur.0 + ur 0. Œ (8-52) øur+ ur Œ ¿ø/. ./ œŒ and therefore

rorèè%& ;+è rro %& è ;+ RRo %& ;+ Ro/ (8-53)

Further, the vectors r , rè o , and rè% o è r & , are mutually perpendicular, as may be verified by taking dot products, and yroè y ; r è %& or è . As a re- sult, the rotation of o about r by an angle o will require that O be of the form 24 Explanatory Supplement to Metric Prediction Generation

Ooorr;%Y & ) `lp%o ror èè) % && pfk% o ro è & ;)o%`lpo + .&% ror èè) % && pfk% o ro è & (8-54) ;)+o%. `lp&%o Ro/ & ) pfk o % Ro & ;)+XF %. `lpo & R/ ) pfk o Ro Z

But since this is true of every o , it follows that the desired rotation matrix is

OF;)+%. `lpo & R/ ) pfk o R (8-55)

Thus, given r and o , O can be calculated directly, as is done by the SPICE toolkit function AXISAR .

The reader may verify that R/ is a symmetric matrix,

æ/ / Õ ø+%rr/0 ) & rr ./ rr .0 Œ ø Œ R/;ørr +% rr / ) / & rr Œ (8-56) ø./ . 0 /0 Œ ø Œ rr rr+% rr/ ) / & ¿ø.0 /0 ./ œŒ so that the difference

O+ OQ ; %/ pfko & R (8-57) contains the sine of the desired angle and the elements of r . If the angle is not zero or n , then

æ Õ øo0/+ o /0 Œ ø Œ s ;øo.0 + o 0. Œ øo+ o Œ ¿ø/. ./ œŒ s ;sY s ; / pfk o (8-58) s r; ; sÉ s o ; pfk+. %s ,/&

It is again necessary to presume that o lies in the interval X-)n Z in order to pro- duce a unique solution for the rotation axis. Mathematical Techniques 25

If the angle is zero, then, of course, O; F ; if it is n , then the rotation axis ele- ments satisfy

. R/ ;% O + F & (8-59) / The rotation axis coordinates are retrievable from this equation, but it isn’t pretty. The general solution for finding the rotation axis and angle from the rotation ma- trix is found in the SPICE toolkit function RAXISA .

The SPICE toolkit also has a number of other routines that compute a spectrum of quantities associated with rotations.

8.1.4.10. Euler Angles θ If the Cartesian axes are designated 1, 2, 3, then the notation [ ] i signifies the matrix that rotates one coordinate system by an angle  about axis i. It also ro- tates vectors by an angle +o about the same axis. The right-handed rule applies, so a counter-clockwise rotation is in the positive direction. The SPICE function ROTATE generates this matrix. The three coordinate axis rotations are

æ Õ ø. - - Œ ø Œ Yo [. ; ø- `lp o pfk o Œ ø Œ ø-+ pfko `lp o Œ ¿ø œŒ æ Õ ø`lpo -+ pfk o Œ ø Œ Yo [/ ; ø- . - Œ (8-60) ø Œ øpfko - `lp o Œ ¿ø œŒ æ Õ ø`lpo pfk o - Œ ø Œ Yo [0 ;ø + pfk o `lp o - Œ ø Œ ø- - . Œ ¿ø œŒ Given a coordinate rotation matrix O it is sometimes useful to factor it into a θ θ θ form [1 ][a 2 ][ b 3 ] c . It is not obvious that this can always be done, but if the b axis is not the same as the a or c axis, then it can. The three angles in this de- composition are called the Euler angles of the coordinate rotation. 26 Explanatory Supplement to Metric Prediction Generation

θ θ θ The composition of rotations [aa ][ bb ][ cc ] is sometimes referred to as an a-b-c θ θ θ rotation with respect to the given Euler angles. For example, [13 ][ 21 ][ 33 ] is a “3-1-3” rotation. The individual rotations are applied to a vector in the order c-a- b. Given the Euler angles and axes, the resulting composite rotation matrix may be found using the SPICE function EUL2M .

The 3-1-3 is a particularly useful rotation in MPG applications because it is a convenient representation of the transformation between inertial and body fixed frames. For example, given the right ascension _ and declination b of a body‘s polar axis with respect to the inertial frame, together with the diurnal angle u of its prime meridian at the given instant, then the transformation is

O ;XZX,/un0 + bn ZX,/ . ) _ Z 0 (8-61)

But while it is true that all rotation matrices can be decomposed into a set of Euler angles, this decomposition may not be unique. Given a rotation matrix and the set of axis designations, in which the second Euler angle axis differs from the first and third, the SPICE function M2EUL does produce a valid set of Euler an- gles.

While it is easy to see that the composition of Euler angles produces a rotation, the factorization of a given rotation into its Euler angle components requires some manipulation. There are 12 different formulas for a-b-c rotations in which b differs from a and c. By appropriate relabeling of coordinate axes, maintaining a right-hand system with each, this number of essentially different factorizations drops to only two, viz., those equivalent to 3-1-3 and to 1-2-3. M2EUL takes these factors into account in its computations.

The factorization method merely extracts the Euler angles from the composite rotation matrix terms. For the 3-1-3 case, the rotation O ; Xofg Z is »cacc− cbsasc cbccsa + casc sasb ÿ =θ θ θ =−−… − Ÿ R [aa ][ bb ][ cc ] …cc sa ca cb sc ca cb cc sa sc ca sb Ÿ (8-62) …sbsc− ccsb cb ⁄Ÿ in which the following associations have been made: Mathematical Techniques 27

=θ = θ cacosa sa sin a =θ = θ cbcosb sb sin b (8-63) =θ = θ cccosc sc sin c From this the Euler angles can now be recovered, with the convention that all angles lie in X-)n Z . It is immediate that

+. o_ ; `lp o00 (8-64) Further, if this angle is neither - nor n , then the ratio of the remaining right col- umn terms gives

o.0 ; q^k o^ o/0 (8-65) o^ ; ^o`q^k%o.0 ) o /0 & Similarly, the ratio of the remaining bottom row terms produces

o0. ; + q^k o` o0/ (8-66) o` ;^o`q^k%o0. ) + o 0/ & To obtain correct answers, it is necessary to preserve the signs of the sine and cosine terms presented to the 2-argument arc tangent function, as above.

If o_ is zero or n , then there is a degenerate case. O is the sum of two rotations about the 3-axis, so the factorization is not unique, but the sum of the two angles is. In this case, it is usual to set o` ; - , and solve for o^ . »ca cbsa sasb ÿ =θ θ =… − Ÿ R [aa ][ bb ][0] c …sa ca cb ca sb Ÿ (8-67) …0 − sb cb ⁄Ÿ from which the angle o^ may be recovered as

o/. ; + q^k o^ o.. (8-68) o^ ;^o`q^k% + o/. ) o .. &

28 Explanatory Supplement to Metric Prediction Generation

The reader interested in how the other transformations are handled may consult the source code of M2EUL .

The next subsection treats rotations of velocity vectors, which require the deriva- tive of the rotation matrix. Fortunately, when a rotation is expressed in Euler angle form, the derivative involves only the individual angular rates, as aO ;;O""XZXZXZooo ) XZXZXZ ooo " ) XZXZXZ ooo " (8-69) aq ^^__`` ^^__`` ^^__`` The SPICE toolkit contains functions, such as DROTAT and EUL2XF that ac- commodate derivatives of Euler angles.

8.1.4.11. State Vectors and Transformations It is frequently the case in MPG applications that not only the positions of ob- jects, but also their velocities, are used in producing metric predictions. The 6- vector in Cartesian space formed by joining corresponding position and velocity values with respect to the same coordinate frame is called a state vector . Thus, if Q o; o %&q ; % oqoqoq. %&) / %&) 0 %&& is a time dependent position vector and Q Q Q s;;; s%&q o" %& q %%&)%&)%&& oqoqoq """./0 ; %%&) sqsqsq ./0 %&)& ' & ; %) sss ./0 ) & is its time derivative, then the state vector that contains the values of both is

Q p ; X))))oooss./0 . / ) s 0 Z (8-70) Many of the SPICE toolkit functions, such as SPKEZ , return state vectors, while others, such as SXFORM , return state vector transformation matrices, while still others operate on these in various ways. If O; O %q & is a time-dependent posi- tion rotation matrix, then the rotated position is simply o~ ; O o , but its rotated velocity is

s~ ; Oo" ) Oo" ; Os ) Oo " (8-71)

The state transformation matrix thus takes the form

æO - Õ P ; ø Œ øO" O Œ ¿ø œŒ (8-72) p~ ; P p

However, the inverse transformation matrix is not the transpose of P, but instead Mathematical Techniques 29

æOQ - Õ +. ø Œ P ; ø Œ (8-73) O" Q O Q ¿ø œŒ This is true because the inverse mapping computes O+. ; O Q , o; OQ o ~ , and s; OsQ~ ) Oo" Q ~ . However, if one multiplies the two matrices P+. P to ver- ify that they are indeed inverses, the diagonal terms appear to be unity, as ex- pected, and the lower left-hand corner matrix is zero, as expected, but the term OO"Q) OO Q " appears in the upper right-hand corner of the product matrix. This term is also an all-zeros matrix because it is the derivative of OQ O; F , the identity matrix, a constant.

States used within the MPG are commonly computed from other states extracted from ephemerides of solar system bodies, each of which may be referenced to a natural frame for that body. Observed states typically difference the state of a given target body with the state of a given observer.

Transformations on state vectors depend on a number of effects related to the frames of reference in which the states are observed. The most obvious of these is the rotation from one coordinate frame to the other. If both states are inertial, then the rotation is independent of time. If non-inertial, then a time dependency will be present.

Other transformations are necessitated by the forms of aberration that may be present in an observed state, caused by the finite velocity of light in combination with gravity and the motions of the observer and the observed object. The effects that may be present are (1) stellar aberration, due to the special relativistic effects of the motion of the observer, (2) planetary aberration, due to the finite velocity of light, (3) gravitational deflection, due to general relativistic effects in the in- tervening space, (4) gravitational retardation of time, due to these same general relativistic effects, (5) refraction by an atmosphere, and (6) delay due to the iono- sphere.

These effects are considered more fully in chapters that discuss the specific MPG algorithms in which they are a factor. However, it is worth noting here that the velocity components of state vectors returned by SPICE functions are not fully corrected for planetary aberration, even when that option is specified. Rather, the velocity is obtained using the geometric state of the target at the light-time cor- rected epoch. That is, if the position component is the vector difference between 30 Explanatory Supplement to Metric Prediction Generation an observer at time q^ and a target body at time differing by the light time be- tween the two, or o; ol_p%q ^ & + o qdq % q ^ ç IQ& , then the velocity component re- turned by spice is sPMF@B; o" l_p%q ^ & +o " qdq % q ^ ç IQ& . The true time derivative, however, is

ªa IQ s; o"%q & +o " % q ç IQ&.º ) À qorb l_p l_p qdq l_p Ωº aq À Ã l_p (8-74) a IQ ;sPMF@B_ o" qdq%q l_p ç IQ& aq l_p Since the light time derivative is essentially the velocity divided by the speed of light, the size of the velocity correction term is approximately sqdq s, ` , which may be a noticeable effect, depending on the application.

8.1.5. Quaternions Quaternions were first described by the Irish mathematician Sir William Rowan Hamilton in 1843, who was attempting to extend the concept of complex num- bers to three dimensions. He was unable to do so, but was able to define a four dimensional structure with useful properties. Quaternions have now been mostly superseded in most applications by vectors, but they still find use in some appli- cations, notably electrodynamics, general relativity, and operations involving rotations in 3-space.

The SPICELIB toolkit uses them, for example, as compact representations for rotations. But their computational complexity is not more efficient than are the rotation matrices they represent to the user. It is because they are used by SPICE that they are introduced here.

Quaternions are represented by 4-tuples on which a special kind of arithmetic is defined. Simply put, quaternions are an extension of the complex numbers of the form 3 ^_)i ) ` j ) a k where a, b, c, and d are real numbers and i, j, and k are basis elements of unit length such that i/; j / ; k / ; ijk ;+ . . The set of all quaternions is commonly given the symbol 2, in honor of Hamilton.

3 This is the original Hamilton form. Several other conventions are found in the open literature, the most frequent of which places the scalar at the end of the 4-tuple representation of the quaternion. Mathematical Techniques 31

8.1.5.1. Quaternion Arithmetic From the above, one may observe that basis products are given by

ij; + ji ; k jk; + kj ; i (8-75) ki; + ik ; j

Thus, unlike multiplication of real or complex numbers, multiplication of quater- nions is not commutative, i.e., generally speaking, nn./î nn /. . In fact, multi- plication of non-identical 4 basis elements above follows the same rules as cross products of the corresponding basis vectors of Cartesian space.

Addition of quaternions is accomplished by adding corresponding coefficients of basis elements. Negation replaces each basis coefficient in the quaternion by its negative. Subtraction is the addition of one quaternion, the minuend, to the nega- tive of another, the subtrahend. Multiplication is carried out in the usual way, using the multiplication table above on products of basis elements. The distribu- tive law holds both for left and right multiplication of a sum by a quaternion. Because of these traits, 2 is a (noncommutative) division ring. Multiplication is associative and every nonzero element has a unique multiplicative inverse. It is known that quaternions, complex numbers, and real numbers are the only asso- ciative division algebras over the real numbers.

Quaternions are frequently represented as a scalar part plus a vector part,

n^u;)i ) v j ) w k ;) ^ o (8-76)

Concerning products, given two quaternions n.;^ . ) o . and n/;^ / ) o / , the basis product rules introduced earlier translate the product n. n / into another quaternion, given by

nn./;^^ ./% + oo ./Y &% ) ^^ ./ o ) /. ooo & )è ./ (8-77)

In this form, the dot and cross products of the quaternions’ vector parts observe the same rules of vector algebra given earlier in this chapter. This form is known as the Grassman product , after the German mathematician Hermann Grassman,

4 Multiplication of identical basis elements, however, produce -1, and not zero, as would a cross product. 32 Explanatory Supplement to Metric Prediction Generation and is sometimes used to define the product, rather than by using the definition of basis function products given earlier. As defined, the product requires 16 multi- plications and 11 additions.

8.1.5.2. Conjugation and Norm The conjugate of a quaternion q, denoted q*, has the same scalar part, but a ne- gated vector part.

n' ;^ + o (8-78) The conjugate of a quaternion product is the product of the conjugates in reverse order,

' ' ' %nn./ & ; nn /. (8-79)

The norm of a quaternion n is defined as

yyn; nn' ; nn '/ ;)^ %& ooY ;))) ^uvw //// (8-80)

A quaternion r whose norm is unity is called a unit quaternion .

8.1.5.3. Representation of Rotations In the SPICE toolkit, quaternions are to represent rotations in the following way. Let a r be a unit vector and o be an angle in X-)n Z by which vectors are to be rotated about the r axis. Then the representation of this rotation O in quater- nion form is taken to be

n ;`lp%o ,/& ) pfk% o , /&%rui )) r v j r w k & (8-81) ;`lp%o ,/& ) pfk% o , /& r

Note that this quaternion has unit norm.

The quaternion arithmetic that actually performs the rotation of a vector s by an angle o about r can be shown to be

Os; nsn ' (8-82) in which s and O s are treated as the vector parts of quaternions whose scalar parts are zero. Mathematical Techniques 33

Similarly, if n. and n/ represent rotations O. and O/ , then it can likewise be shown that n/ n . is the quaternion that represents the rotation O/ O . .

' ' ' OOs/.%&; nnsnn /../ % & ; %&%& nnsnn /. /. (8-83)

The SPICE toolkit contains functions for conversion between matrix and quater- nion forms of rotations. Further information may be found in the SPICE required reading on rotations [Acton2005-Rotate].

8.1.6. Vector Functions and Calculus Generally speaking, the components of vectors may be functions of abstract or physical parameters that vary with respect to various entities of interest. In met- ric prediction, the variables are often position in space and time. The most gen- eral space-time dependent 3-vector over 0 is one of the form

rr;%)))&uvwq ;_ %)))& uvwq f ) ` %)))& uvwq g ) e %)))& uvwq h (8-84)

The functions _ , ` , and e represent components along each of the three coor- dinate axes. Vectors of this type are often referred to as a vector field , even though they do not form a field in the mathematical sense introduced earlier in this chapter. For this reason, vector fields here will be distinguished by the term v-fields . V-fields are vector functions of space and time coordinates.

When vectors are not functions of time, they are said to form a steady v-field , and, when constant, said to be uniform v-fields .

Vector calculus is the extension of ordinary calculus over 0 to v-fields. For any position %)u v ) w & and any time q , the value of r is a 3-vector. If the position or time changes by an incremental amount, there will be a corresponding incre- mental change in the vector,

aar;_ f ) a ` g ) a e h (8-85) whose directional increments are found using the total differential formula in or- dinary calculus over 0 , 34 Explanatory Supplement to Metric Prediction Generation

ë_ ë _ ë _ ë _ a_ ; au ) av ) aw ) aq ëu ë v ë w ë q ë` ë ` ë ` ë ` a` ; au ) av ) aw ) aq (8-86) ëu ë v ë w ë q ëe ë e ë e ë e ae ; au ) av ) aw ) aq ëu ë v ë w ë q In particular, if o;u f ) v g ) w h is a position vector, then velocity and accel- eration vectors are given by

a o au av aw s; ; fgh ) ) aq aq aq aq / (8-87) as a o au/ av / aw / ^;; ; fgh ) ) aq aq/ aq / aq / aq / If the %)u v ) w & components are functions of a parameter p , then the locus of o as p varies over some domain defines a space curve . Specifically, if p denotes the length of the arc from some reference point on the curve to %)u v ) w & , then the de- rivative of o with respect to p ,

a o au av aw ;f ) g ) h (8-88) ap ap ap ap is of unit length, because

ao a o au/ av / aw / Y ; ) ) apap& ap' & ap' & ap ' (8-89) au/) av / ) aw / ; ; . ap / Since ao lies along the curve, the derivative represents the unit tangent vector to the curve at %)u v ) w & .

8.1.6.1. Differentiation Rules The following rules show how vector products conform to the rules of ordinary calculus. Mathematical Techniques 35

a%rsY & a s a r ;rY ) Y s aq aq aq a%rsè & a sr a ;èr ) è s (8-90) aq aq aq a& cr ' a r ac ;c ) r aq aq aq The square of a vector’s magnitude is r/ ; rY r , so the magnitude and vector derivatives are related by

ar ar r ; r Y aq aq (8-91) ar ra r a r ;Y ; rÉ Y aq raq aq where rÉ is the unit vector along r . This result is not trivial, since yar yyî a r y . As a further result of this relationship, if r is constant, then ei- ther ar , aq ; - or else ar , aq is perpendicular to r .

8.1.6.2. The © Operator The vector operator © (called del or nabla 5) is defined to be ë ë ë ©;f ) g ) h (8-92) ëu ë v ë w Note that © is just an operator, just as a, au is an operator in differential calcu- lus. Thus,

ë ë ë ©;d%f ) g ) h & d ëu ë v ë w (8-93) ëd ë d ë d ;f ) g ) h ëu ë v ë w © acts as both a vector and a differential operator, and thus is a called a vector operator because its components %,ëëu ), ëë v ), ëë w & are differen- tial operators.

5 The term means “Assyrian harp,” and appeared profusely in early literature. It has given way in recent times to del , which is preferred in modern usage. 36 Explanatory Supplement to Metric Prediction Generation

If o;u f ) v g ) w h denotes a position vector and ao; au f ) av g ) aw h denotes an increment, then o) a o defines a neighborhood about o . The scalar differential ad is then given by the expression

ëd ë d ë d ad;©; ao Y d au ) av ) aw (8-94) ëu ë v ë w The extension of this equation to a v-field c is

ëc ë c ë c ac;% a oY ©; & c au ) av ) aw (8-95) ëu ë v ë w If the v-field has a temporal dependence, then the total differential becomes

ëc aac;% oY © & c ) aq q ë (8-96) ëc ë c ë c ë c ;au ) av ) aw ) aq ëu ë v ë w ë q

8.1.6.3. The Gradient Let d%)u v ) w & be any continuous differentiable scalar space function. Then the gradient of this function is defined to be ©d%)u v ) w & , sometimes denoted do^a%d & . The gradient of a scalar function is thus a vector. Because of its rela- tionship to ordinary calculus, it is straightforward to show that

©%rs & ;©)© rs sr (8-97)

Viewed geometrically, the equation d%))&uvw; d %)) uvw- - - & defines a set of points %)u v ) w & in 3 dimensional space that lie on a surface that (obviously) con- tains the given point o-; %u --- ) v ) w & . For all points with ao on this constant surface, ad ; - . Eq.(8-94) then implies that the vectors ao and ©d are per- pendicular to one another. But since this ao lies on the surface, it follows that the vector ©d must necessarily be normal to that surface.

If ao is allowed to be oriented in any direction a neighborhood of a point o on the surface, then Eq.(8-94) will take its maximum positive value when ao lies in the same direction as ©d . From this, it may be concluded that the gradient ©d lies in the direction of maximum increase of the function d%)u v ) w & . It is from this property that the gradient derives its name. Mathematical Techniques 37

8.1.6.4. The Divergence of a Vector

The divergence of any vector c;cu f ) c v g ) c w h is defined to be the dot product differential

ëcëc ë c afs c;©Y c ;u )v ) w (8-98) ëu ë v ë w This function is thus the sum of the changes in each of the coordinate directions of the v-field c at the point of evaluation. It therefore represents a scalar amount by which the v-field is increasing at that point. It is this physical interpretation that gives the divergence its name.

If d%)u v ) w & is any scalar space function, then the divergence of the product can be shown to follow the rule

©Y%dc & ;©)© d Y c c Y d (8-99)

This result resembles the product derivative rules of ordinary differential calcu- lus, dot and cross products, and gradients.

One immediate computation of note is that

©Y o ; 0 (8-100)

An important consequence of this and the divergence product rule is that the di- vergence of an inverse-square v-field vanishes, o ©Y ; - (8-101) o 0 The reader may verify this result by applying Eqs. (8-99), (8-100), and (8-97) to (8-101).

Another important function is the divergence of the gradient, called the Lapla- cian , which is variously denoted by

ë/d ë / d ë / d I^md;©©Y %&% d ;©©;© Y & dd/ ; ) ) (8-102) ëu/ ë v / ë w / 38 Explanatory Supplement to Metric Prediction Generation

8.1.6.5. The Curl of a Vector

The curl of a vector c;cu f ) c v g ) c w h is defined mathematically as the cross product differential f g h ë ë ë `roi c; ©è c ; ëu ë v ë w (8-103) cu c v c w

ª ëcwëcvª ëë cc uw ª ë c v ë c u ;fº +)À gº +)À h º + À ΩúëëvwÀΩº ëë wu ÃÀ Ωú ëë uv À

The curl of a v-field is again a v-field in which the resultant component along each of the coordinate axes is the difference between the partial derivatives of the components along the other two axes taken with respect to their perpendicular direction.

In applications where c is a measure of the magnitude and direction of a flow field, the curl measures the rate and direction of circulation in the flow at a given point. It is perhaps for this reason that James C. Maxwell coined the name curl for this operator in his works on electromagnetic phenomena.

The reader will note that as a consequence of this definition

© èo ; - (8-104) Straightforward application of the definition and scalar calculus produces the following curl product formula:

©è%dc & ; d ©è c )©è d c (8-105)

If d%)u v ) w & has continuous second derivatives with respect to all variables, then it can be shown that the curl of the gradient is zero,

`roi do^a d; ©è© d ; - (8-106)

Similarly, it can be shown that the divergence of the curl is likewise zero,

afs„`roi„c;©Y % ©è c & ; - (8-107)

These relationships, along with other vector differential identities are summa- rized in the next subsection. Mathematical Techniques 39

8.1.6.6. Recap of Vector Calculus Formulas The principal differential vector calculus identities are relisted below.

ë ë ë ©;f ) g ) h (8-108) ëu ë v ë w ëcëc ë c ©;Y c u )v ) w (8-109) ëu ë v ë w

ë/d ë / d ë / d ©©;©©;©;Y%&%d Y & d/ d ) ) (8-110) ëu/ ë v / ë w / f g h ë ë ë `roi c; ©è c ; (8-111) ëu ë v ë w

cu c v c w

©%rs & ;©)© rs sr (8-112)

©Y%dc & ;©)© d Y c c Y d (8-113)

©è%dc & ; d ©è c )©è d c (8-114)

©è©d ; - (8-115)

©Y% ©èc & ; - (8-116)

©Y%&%cd è ;©è cdc & Y + Y % ©è d & (8-117)

©èè%cd &%&%&%&%& ; dccdY ©+ Y ©+© Y cd )© Y dc (8-118)

©è©è%c & ;©© %&Y c +© / c (8-119)

©%&cdcY ;è©è % dd & )è©è % cdccd &%&%& ) Y ©) Y © (8-120)

©©%doY © &% ;© do Y © &% )© od Y © & (8-121)

%sY © & o ; s (8-122)

©Y o ; 0 (8-123) 40 Explanatory Supplement to Metric Prediction Generation

© èo ; - (8-124) ëc aac;% oY © & c ) aq (8-125) ëq ëd aad;o Y © d ) aq (8-126) ëq o ©Y ; - (8-127) o 0

8.1.7.

8.1.7.1. Analytic Space Curves Let q be a scalar parameter ranging over a continuous interval of values, and let u%q & , v%q & , and w%q & be functions of q that have continuous derivatives of all orders and that can be expanded in a Taylor series in the neighborhood of any given q . Then the locus of points described by the position vector

o%&q;u %& q f ) v %& q g ) w %& q h (8-128) over the q domain defines a three-dimensional curve in Euclidean space. It was noted earlier that the derivative along the curve is the tangent, and, in particular, a o q ; (8-129) ap is a unit vector tangent to the curve that is positive in the direction of increasing distance p along the curve.

Note that the derivative whose increment is given by Eq.(8-95) takes the form ac a o ;%Y ©; &%c q Y © & c (8-130) ap ap Since q is a unit vector, its derivative along the curve is perpendicular to q and its magnitude is the rate at which the unit tangent is changing direction at the given point. The principal normal k to the curve is thus defined by Mathematical Techniques 41

. aq k ; i ap a q i ; (8-131) ap . p ; i The parameter i is known as the at that point. Its reciprocal p is the radius of curvature .

A third unit vector, normal to both of these, called the binormal, can then be de- fined,

_; q è k (8-132)

8.1.7.2. Local Coordinate System These three unit vectors establish a right-handed local coordinate system in which all vectors associated with the curve can be written as a linear combination of these fundamental basis vectors. In particular, the derivative of _Y q ; - leads to the result that a _ ; r k (8-133) ap where, by definition, r is the magnitude of the derivative, called the torsion of the curve.

Further, differentiation of k; _ è q leads to an expression for ak , ap , which must lie in the plane defined by q and _ . The equations for the derivatives of the three basis vectors are known as the Frenet-Serret formulas, which are aq ; i k ap ak ; +%iq ) r _ & (8-134) ap a_ ; r k ap 42 Explanatory Supplement to Metric Prediction Generation

Successive derivatives of the basis vectors are thus functions of q , k , _ , and the derivatives of curvature and torsion.

8.1.7.3. Fundamental Planes The plane containing the tangent and principal normal is called the osculating plane . It is defined by the point o on the curve and the normal vector _ . Any point m in the osculating plane satisfies the condition

%m+ o &Y _ ; - (8-135)

If the motion is torsion-free, it is confined to the (constant) osculating plane.

The normal plane to the curve at o is defined as that perpendicular to the tangent vector q . Points in this plane thus satisfy the condition

%m+ o &Y q ; - (8-136)

The third fundamental plane is the rectifying plane , which is normal to k . Points in it satisfy

%m+ o &Y k ; - (8-137)

8.1.7.4. Intrinsic Parameters The curvature and torsion of a curve depend on the point o of the curve and, consequently, the arc distance parameter p . Thus, they may be written as i%p & and r%p & , and, in this form, are called the intrinsic parameters of the curve. Any two curves having the same intrinsic parameters are identical except possibly for their orientation in space. If the orientation at o is such that the vectors q , k , and _ of each curve coincide, then, by analyticity, the two curves coincide.

8.1.7.5. Application to Wave Propagation The MPG contains an application called RAYPATH that predicts the bending and lengthening of communication paths that involve the effects of planetary atmos- pheres. This subsection contains the fundamental mathematical basis on which the path effects may be predicted. Mathematical Techniques 43

If w%)u v ) w & represents a the spatial form of a wave propagating in space through a medium whose index of refraction 6 is l%)u v ) w & , then it is known that the propagation path is governed by a relationship, known as the eikonal equation, which is

©dY © d ; l / (8-138)

Since it is known that the wave is traveling in a direction tangent to the path, this equation may be “square-rooted,” to yield

©w ; l q (8-139)

Setting c ; © w and q ; © d, l in Eq.(8-130) produces

a ©w . ;% ©wY ©© & w (8-140) ap l Further, Eq.(8-121) with d; o ; w permits the right-hand side above to be- come

a ©w . ; ©l/ ;© l (8-141) ap / l

With Eq.(8-139) and application of the , this is

a%l q & a q al ;l )q ; li kq )©;©%Y l & q l (8-142) ap ap ap Rearrangement of terms yields the refraction equation

a q . ;ik ;%&% ©+ l qY © l & q (8-143) ap l This equation relates the index of refraction, its gradient, and the ray path tangent vector to the instantaneous curvature in the direction of the principal normal. Further, it provides a first order for the tangent vector, which traces out the ray path when integrated.

6 The index of refraction is normally given the symbol n in the literature, which might here be con- fused with the use of that symbol for the principal normal to the curve. For this reason, n is used. 44 Explanatory Supplement to Metric Prediction Generation

Modeling the refractive index and solving the refraction equation it engenders are beyond the scope of this chapter on mathematical methods. The interested reader may consult the RAYPATH code documentation for an approximate solution for the case of a radially oriented spherically symmetric refractive index.

8.1.8. Dual Vector Spaces and Linear Functionals Earlier sections of this chapter have dealt with “ordinary vectors” defined over a field (usually real or complex numbers). They are used to represent coordinate spaces and other n-tuples that obey the rules of linear algebra. Matrices over the field provide the means for transforming between coordinate systems or systems having different bases for the space they describe.

This part of the chapter now elaborates on some subtleties of linear algebra and some changes in notation that emphasize those subtleties. These changes will hopefully prepare the reader to more easily make the transition into tensor forms, discussed a little later.

In general, functions may be so defined as to map vectors into values, such as c%s & ; r , where r is a member of a scalar field. The concept may also be ex- tended to include functions, such as c% s & ; r , which produce vectors of arbi- trary different dimension.

The set of all functions that may operate on a given vector space under a given set of restrictions is called a functional space. The study of the algebraic proper- ties of functionals and their applications is called functional analysis, and is the foundation of the so-called calculus of variation , which is beyond the scope of this chapter. The treatment here narrows the area of interest to linear functionals.

8.1.8.1. Linear Functionals The set of functions of interest here is that which produces only scalar values when applied to given vector space @ . It is further limited to contain only func- tions that obey the usual laws of addition and scalar multiplication. Such a set of functions is called a linear functional space . A linear functional space is thus one such that, for any two functionals c and d in the space,

%c) ds &%& ; cs %& ) ds %& (8-144) %hcs &% &; h cs % & Mathematical Techniques 45 for all s in the vector space and h in the field of scalars. If the space of linear functional maps vectors to scalars (i.e., a single dimension), then the functionals are called one-forms .

c%&s; h clo„ s ß0k ) h ß 0 (8-145)

One forms are functionals that assign scalar values to vectors.

8.1.8.2. The of Functionals For every vector space 0k , call it @ , there is a corresponding vector space of one-forms, denoted @! , having the same dimension as @ , called its dual vector @ space (or just dual space ). If the basis of is denoted by %)b. b / )? ) b k & then the basis functions of @! , denoted using superscripts %)b. b / )? ) b k & , where su- f f perscripts are indices, not powers, are such that b% b g & ;b g ; b fg , The being unity only when its indices are equal, and zero otherwise.

Each member c of @! can thus be expressed as a linear combination of the basis one forms,

. / k f cbb;cc. ) / ))? cck bb ; f (8-146)

The abbreviated form on the right above makes use of Einstein’s summation no- tation, where repeated indexes indicate a summation over all values of that index.

As treated earlier, the members of @ are commonly represented as n-tuples of values over the field 0 . But, as seen in Eq.(8-146), the members of @! may also be represented as n-tuples of values over the field 0 . What is the difference be- tween the two spaces?

In @ the values represent things like distances along an assumed set of basis vectors, and in @! they represent the amplitudes applied to the dual space basis functionals. Clearly, these two concepts are going to get mixed up unless their notations as n-tuples are somehow differentiated.

For this reason, n-tuples in @ are called vectors , distinguished by writing their components in superscript form ^ ; X)^. ^ / )? ) ^ k Z Q , and displaying the tuple as a .è k matrix, or column vector. A vector may also be written as a linear com- bination of its basis vectors, 46 Explanatory Supplement to Metric Prediction Generation

æ^. Õ ø Œ ø Œ ^/ ø Œ . / k ^;ø Œ ;^ bb. ) ^ / ))? ^ b k (8-147) ø@ Œ ø Œ ø Œ ^k ¿ø œŒ Note that, for any ^ in @ , the application of a basis one form bf of @! just se- lects out the i-th component of ^ ,

f gf f b% ^ & ;^bg ; ^ (8-148)

The coefficients ^f; b f % ^ & thus represent distances along each of the basis vec- tors bf , whose combination as in Eq.(8-147) form the vector. Because they have direction, they are sometimes referred to as tangent vectors.

Members of @! are then called cotangent vectors , or simply covectors . They are distinguished by writing their components in subscript form c ; X))c. c / ? ) c k Z , and displaying the tuple as an k è . matrix, or row-vector. A covector may also be written in its basis functional form

. / k c;X))cccc./?? )k Z ;))) . bb c / c k b (8-149)

The coefficients cf; c% b f & now represent the projections of c on each of the basis vectors bf , since application to an bf discards components that are or- thogonal to that bf . In summary, the distinctions between vectors and covectors are:

Tangent vectors possess length and direction. Their components are denoted using superscript notation, as ^f , and their values are the

weights by which their basis vectors bf add in length and direction. Their basis vectors express the coordinate axes of the vector space.

Covectors are functionals that map vectors onto scalars. Their compo-

nents are denoted using subscript notation, as cf , and their values are projections of c onto each of the coordinate axes of the vector space.

The terms “vector” and “covector” applied to a given n-tuple are but two differ- ent conventions for interpreting the components of the object with respect to the Mathematical Techniques 47 given coordinate system. In fact, both forms are valid expressions with respect to the given coordinate system.

8.1.8.3. Functional Application and the Inner Product The application of c to ^ , denoted c^% &; c^ ) ; c^Y is evaluated using the linear functional combination rules found in Eq.(8-144),

k k f g c^%&); c^ ; ∫∫ cg ^ bb %& f f;. g ; . k f f ;∫ c^f ; c^ f (8-150) f+. æ^. Õ ø Œ ø^/ Œ ; X))c c? ) c Z ø Œ . / k ø@ Œ ø Œ ø^ Œ ¿øk œŒ The character of the dual space is that it consists of all linear functionals on @ , such that a one-form in @! maps a vector in @ to the inner product of the coeffi- cient tuples of the objects in each space, regardless of what the vector basis ele- ments are. This is an important characteristic of the dual space that does not ex- tend to inner products of ordinary vectors, as defined earlier in Eq.(8-6).

However, aside from the placement of indices, it is not possible from Eq.(8-150) to ascertain whether the sum derives from cY ^ or from ^Y c ; that is, to ascertain which n-tuple is the vector and which is the covector.

Whereas the dot product of vectors has a geometric connotation, i.e., the product of lengths times the cosine of the included n-space angle, the one form functional :c) ^ <; h mapping has a different interpretation. The set of all c that map ^ to the constant h lie in a plane in @! , with h, ^ being the minimum distance measure of the plane from the origin.

8.1.8.4. Basis One Forms of the Dual Space @ For given basis vectors %)b. b / )? ) b k & of , what are the corresponding basis one forms %)b. b / )? ) b k & of the dual space @! , and are they unique? Envision if you will in @! the n-dimensional parallelepiped formed by the intersections of 48 Explanatory Supplement to Metric Prediction Generation the planes defined by c) b f ; - and c) b f ; . for each of the bf . This fig- ure must exist, because the basis is presumed to span the entire space. The num- ber of vertices will be /k because there are two planes for each of the n basis vectors. There will be exactly n vertices among these that, for f; .)/)? ) k , have c) b f ; . and c) b g ; - for all gî f . So, for each index f , label the corresponding vertex as the basis one form bf . This process assigns a unique one form to each basis vector.

8.1.8.5. Changing the Vector Space Bases Since the dual space consists of linear mappings of vectors, it finds wide applica- tion in describing various linear transformations among vector bases. A vector space is a geometrical entity that is, in principle, independent of the particular coordinate system chosen to describe it. That is, if there is another set of basis @ vectors %)b. b / )? ) b k & for , then these must be linear combinations of %)b. b / )? ) b k & ,

g bf; ^f b g (8-151)

g The coefficients in the above form may be displayed in matrix form X^f Z , as in- troduced earlier in Eq.(8-20). However, the notation here is somewhat different, because now there is a sense, by placement of indices, of the roles of the ele- ments: The subscripts denote values belonging to the same transformed basis vector, while the superscripts identify elements that are functional projections on the original basis vectors. The matrix, of course, must be nonsingular in order @ for the alternate basis %)b. b / )? ) b k & to span . There may be changes of bases in the dual space, as well. For example, the new basis elements %)b. b / )? ) b k & must be linear combinations of the old elements %)b. b / )? ) b k & of the form

f f g b; cg b (8-152)

Here may be noted that the order of indexes in the transformation is transposed. The ways that vector and covector basis elements transform are subtly different, as discussed next. Mathematical Techniques 49

8.1.9. Vectors are commonly used to describe some aspects of physical space, in which direction and magnitude are concepts significant to the characterization of the space for a given purpose. In astrodynamics, for example, vectors characterize the locations and velocities of solar system objects relative to a specified frame of reference.

In Euclidean space, there are concepts of distances and angles and invariance of these under certain operations in the space, such as translations and rotations of coordinates. Euclidean space is one in which ordinary vector algebra holds. It is a vector space.

But, as Einstein and others have shown, there are spaces that are not Euclidean, but appear to be distorted by relative motion and gravity. On a small enough scale, some of these appear to be Euclidean locally, as ap- pears to apply to distances and angles in local space.

A manifold is an abstract mathematical space in which every point has a neighborhood that resembles Euclidean space, but in which the global structure may be more complicated. That is, given any precision requirement, there exists a region about the given point in which the deviations from Euclidean geometry are less than the given precision.

Abstract mathematical spaces are characterized by continuity, connectedness, convergence, and dimension. Manifolds add the requirement that localized be- havior, in the small, becomes Euclidean. For example, each small locale in a one-dimensional manifold appears to be a line segment, if examined closely enough. Examples of one-dimensional manifolds include lines and circles. Each small locale in a two-dimensional manifold resembles a disk; examples include the surfaces of a sphere or torus.

Manifolds are useful because they permit more complicated global theories to incorporate simpler localized behaviors. Such is the case, for example, whereby Newton’s laws may be embedded within the wider context of general relativity.

Additional characteristics may also be imposed on manifolds. Examples include differentiable manifolds , which allow the operations of calculus, Riemannian manifolds , which permit the characterization of distances and angles, and pseudo- Riemannian manifolds , which are used to model spacetime and general relativity. 50 Explanatory Supplement to Metric Prediction Generation

The principle mathematical tools relating to manifolds are linear (vector) algebra and calculus, and their extensions into multiple dimensions, called tensor analy- sis, which is discussed in a later section. By operating on very small increments of distances, the linear theory can be brought to bear on globally nonlinear prob- lems.

8.1.9.1. Vector Fields on Manifolds Recall from an earlier section of this chapter that a vector field was defined as an association of a vector with each point in a Euclidean space. By extension, a vec- tor field on a manifold defines the direction and magnitude of a vector (or tensor) at each point of the manifold. In Euclidean space, it is common practice to visu- alize a vector as extending from one point to another. However, on a manifold, the Euclidean character is restricted to small neighborhoods about a point, and so the distinction must be made. The wind velocities at all points in a volume of the atmosphere illustrate this vector field concept.

If u f denote coordinates on a manifold, so u ; %)u. u / )? ) u k & is a point on the manifold, then the differential position au ; X) auau. / ))? au k Z Q represents a small interval in the neighborhood of u . For a manifold, Euclid’s laws are pre- sumed to apply at this differential level of detail.

Further, if the coordinate axes of the manifold are the curves uf % p & , where p is a parameter, the distance along the curve from the origin, then the au f represent increments along the coordinate axes.

8.1.9.2. Covariance and Contravariance The terms covariance and contravariance are assigned to manifold entities to designate how their coordinates behave under a continuous change of coordi- nates. Covariance and contravariance are key attributes of tensors, discussed later. Here the discussion focuses on vectors and covectors.

Suppose that there is s system of smooth continuous coordinates for the manifold whose basis elements are given by uf; uuu f%). / )? ) u k & . Then the new coor- dinates can be written using the total differential rule as

k ëu g g f au; ∫ f au (8-153) f;. ëu Mathematical Techniques 51

A coordinate transformation on a manifold is thus determined by the Jacobian matrix of the partial derivatives of the new coordinates with respect to the old ones.

æëu g Õ ø Œ ; Xj g Z øëu f Œ f ¿ø œŒ (8-154) gg f au; jf au

This type of transformation is called contravariant . The differential vectors au and au are called contravariant vectors. Vectors, such as the basis elements themselves, are given subscripts when necessary to enumerate them. The n-tuple components of a vector, however, are written in superscript notation, and are said to transform contravariantly . In matrix notation, the contravariant matrix trans- formation is a left multiplier of the vector.

Now consider the scalar point function d%)u. u / )? ) u k & and form the gradi- ent n-tuple æëd ë d ë d Õ ø Œ ©u d ;) ))?? ; X)))Zd d d k (8-155) øëu. ë u / ëu k Œ . / ¿ø œŒ Now let this be transformed to the new coordinate set above, this time by apply- ing the multiple variable chain rule,

ëd ëë d ug ë u g df ; ; ; d g (8-156) ëuf ëë uu gf ë u f

A transformation that exercises the chain rule on a manifold thus uses the Jaco- bian matrix containing the partial derivatives of the old coordinates with respect to the new ones.

æëu g Õ ø Œ ; Xu g Z øëu f Œ f ¿ø œŒ (8-157) gg f d; uf d

This type of transformation is called covariant . The n-tuples d ; X))d. d / ? ) d k Z and d ; X))d. d / ? ) d k Z are called covariant vectors . Covectors, such as the basis one forms themselves, are given superscripts when necessary to enumerate them. 52 Explanatory Supplement to Metric Prediction Generation

The n-tuple components of covectors, however, are written in subscript notation and are said to transform covariantly . In matrix notation, the covariant matrix transformation is a right multiplier of the covector.

The two matrix notations are transposes of each other. The two transformation matrices agree if and only if they are orthogonal matrices, in which case, con- travariant vectors and covariant vectors transform identically.

The basis vectors bf themselves transform covariantly (and hence subscript nota- tion applies to them) because they represent the directions of the coordinate axes, which are tangent vectors to the n-dimensional curves that form the axes. If uf %p & is the i-th coordinate axis curve, with parameter p being the length along the curve, then

auf a u f b ; ; (8-158) f ap au f If a new coordinate axis is defined by an n-dimensional space curve u , then the chain rule gives

au auag u au g b; ; ; r (8-159) au auau g au g which is a .

8.1.9.3. Invariance of Inner Product Under Coordinate Change Suppose ^ is a vector that maps to ^ under a contravariant change of coordi- nates, and suppose c is a covector that maps to c covariantly under the same change. Then the inner product after the change will be

ëuf ë u g f h cY ^ ;c^f ; c g ^ ëu h ëu f ë u g h hg g ;c^g ; c^ gb ; c^ g (8-160) ëu h h ; cY ^

Hence, the dot product is a scalar invariant under a coordinate transformation. Mathematical Techniques 53

8.1.9.4. Metric Spaces A metric space is a mathematical space in which, for every two points in the space, there is a nonnegative real number, called their distance , that is symmetric and satisfies the triangle inequality. The method that defines the distance be- tween points is called the metric . If k%u ) v & is a metric function, then

k%)&u v é - k%)&uv; - ≤ uv ; (8-161) k%)&uv; k %)& vu k%)&uvÄ k %)& uw ) k %)& wv clo^ii w

Recall that in Cartesian Euclidean space, the distance between two points is the sum of the squares of the differences along the coordinate axes. Under a linear change of coordinates, the distance computation becomes a quadratic form,

f fg g k%)&u v ;dufg % + vu &% + v & (8-162)

This same metric is used on manifolds, where the differences of vectors are now replaced by a (contravariant) vector of differentials,

f g k%au & ; dfg auau (8-163)

The matrix dfg is called the . The matrix is always symmetric, so there are really only k% k ) .&, / independent elements for an n-dimensional manifold. Further, the matrix is positive definite, that is, the distance is always positive unless au is zero. Any space characterized by such a metric is called a Riemannian space , after Georg Friedrich , who studied them in his doctoral thesis in 1851.

This formulation expresses the character of the intrinsic metrical relations of the manifold in terms of the given coordinate system. In a different coordinate sys- tem, the metric tensor would be different.

Under a change of coordinates of the sort discussed earlier in this section, the differential vectors transform contravariantly, so substitution produces

ëuf ë u g dauavfg; d auau mn ; dauau fg (8-164) fg fgëum ë u n fg 54 Explanatory Supplement to Metric Prediction Generation

From this it follows that the transformed metric tensor is

ëuf ë u g d; d (8-165) fg fg ëum ë u n

Notice that each component of the new metric array is a linear combination of the old metric components, and the coefficients in this transformation are the partials of the old coordinates with respect to the new . The metric tensor for contravari- ant displacements is thus covariant .

Similarly, it may be shown that if displacements are expressed as covectors, then the metric quadratic form is contravariant,

fg k%au & ; d auauf g (8-166)

The two metric tensors are reciprocals of each other,

g h g d d h f ; bf (8-167)

fg The d are the signed minors of the dfg divided by the determinant of the dfg . The metric is the star of Riemannian manifolds. It describes everything about the geometry of the space. It permits the measurement of angles and distances. It characterizes spaces that are not flat but have “curvature.” A precise meaning of what curvature is (aside from being non-flat) is discussed later in the section on tensors.

8.1.9.5. Inner Products on a Manifold Earlier the inner product of a covector and vector was defined. In extension, the inner product , or dot product , of two contravariant vectors u ; Xu f Z Q and v ; Xvf Z Q is defined by the metric form

fg fg uY v ;uv) ; duvfg (8-168)

The product uY u is thus the distance function applied to u , giving the square of the length of u .

For each such contravariant vector u f there is an associated covector denoted by u g defined by the transformation Mathematical Techniques 55

f ug; d gf u (8-169)

The inner product of this associated covector with a contravariant vector vf re- sults in

g f gf gf uvf);; uv f duv fg ; uv ) ; uY v (8-170)

That is, the inner product of a vector with the associated covector of another vec- tor is the same as the inner product of the two vectors.

In a similar manner, the associated contravariant vector of a given covector u g can be defined by

f fg u; d u g (8-171) The dot product of two contravariant vectors can hence be written in terms of the associated covectors as

fg fh gj hgj gj uv); ddudvfg % h &% j & ;b g duv hj ; duv gj (8-172)

Therefore the three forms of the dot product of vectors and covectors are identi- cal,

g f g uvf); uv ) ; uv f ) g (8-173) The processes of transforming between contravariant and covariant vectors given in Eqs.(8-169) and (8-171) is called “raising and lowering indices,” because they convert superscripted n-tuples to subscripted ones, and vice versa.. One converts between contravariant and covariant versions of a given vector simply by apply- ing the metric tensor in covariant or contravariant form as a linear transforma- tion. Summation always involves summation on an upper and lower index.

8.1.10. Tensors Because it appears so profusely in general relativity theory, a short treatment of tensor analysis is included here. A treatise on tensors is not appropriate, but a definition of what they are and how they are used will perhaps help clear some of the haze that often surrounds the term. 56 Explanatory Supplement to Metric Prediction Generation

8.1.10.1. A Little History The term was introduced by William Rowan Hamilton in 1846 to describe the metric character of a certain linear algebraic system. It was used in its current meaning in 1899 by Woldemar Voight.

Tensor calculus was developed around 1890 by Gregorio Ricci-Curbastro under the title Absolute Differential Calculus , and was published in 1900 in a classic text bearing this title by his student Tullio Levi-Civita, in Italian (translations followed). In the 20 th century, the subject became known as tensor analysis .

Albert Einstein learned tensor analysis “with great difficulty”, he related, from the geometer Marcel Grossman, or perhaps, from Levi-Civita himself. He for- mulated his theory of general relativity completely in the language of tensors.

8.1.10.2. Approaches Conceptually, there should be nothing mysterious about tensor analysis. After all, it is just linear algebra and ordinary calculus applied to multidimensional spaces. It is an algebra that relates how things in one coordinate system change when viewed in another coordinate system. That is what relativity is about, and that is why Einstein used it.

It seeks to find things, like distance and angle measures, that are invariant when the coordinate system or coordinate locations changes, and attributes, such as curvature of space, that describes the differences between Newton’s laws of mo- tion and those of Einstein which supplant it. And it attempts to do these things in a completely abstract and general way so that the range of applications extend to all metric manifolds.

But the coupling of abstractness, generality, precision, multidimensionality, and mathematical notation can become fairly mentally daunting to one with only a casual interest in the subject. The exposition here is only meant for those who desire an understanding of the basis of the theory and an appreciation of why it appears to be so complicated. Those whose needs are for a more detailed knowl- edge of tensors will find it necessary to consult appropriate textbooks.

Tensors are used to describe differential quantities and transformations from one set of coordinates to another. Since relativity theories treat invariants in space- Mathematical Techniques 57 time, it is natural that tensors, which relate to invariants under transformations of coordinates, are brought to bear in the analyses.

There are a number of approaches to working with tensors, each boasting some advantage to a class of users. The classical approach views tensors first as multi- dimensional arrays that are n-dimensional generalizations of scalars. The tensor components are viewed as the elements of the array. n This idea is then ex- tended to include arrays whose elements are functions or differentials after the more traditional ideas of vector spaces have provided the elementary motivation.

A more modern approach views tensors as expressions of abstract entities with manipulation rules that are extensions of linear maps to multi-linear algebra. This treatment attempts to generalize the tensor concept for advanced studies and applications. This approach has not become fully popular, however, due to its lack of geometrical interpretation. The presentation in this chapter follows the more classical approach.

8.1.10.3. Generalized Mixed Tensors As was seen in the preceding section of this chapter, subscripts and superscripts are used to number coordinate components and to signify the transformational characteristics of entities. Let x1, x 2 ,..., x n represent a set of generalized coordi- nate axes, and let x1, x 2 ,..., x n represent another such set. Then if a function Q; Quu%. ) / )+++) u k & transforms into the function T= Txx(1 , 2 ,..., x n ) under the rule

K u uu^./ ^ uuu ^ o ` ./ ` u ` p ^./ ^÷÷÷ ^ oë_ ./ _ ÷÷÷ _ o ëë ëëë ë Q_ _÷÷÷ _ ; Q ``` ÷÷÷ _ _ ÷÷÷ _ ÷÷÷ (8-174) ./pëu ./ p ëë uu. / ë u o ëu_. ë u _ / ë u _ p then T is a tensor. The terms above adhere to Einstein’s summation convention wherein repeated indices in an expression are summed over all values of each repeated index. The tensor definition above therefore represents a summation of + nr s terms, and the tensor is said to be of rank o) p . The tensor rank is some- times called the tensor order . The numbers o and p , respectively, are called the contravariant and covariant ranks, and the tensor is said to have valence %o ) p & .

The exponent N of the Jacobian ∂x/ ∂ x is called the weight of the . If N = 0, the tensor field is said to be absolute . If s = 0, the tensor is called purely contravariant , and if r = 0, it is purely covariant . Otherwise, it is called a 58 Explanatory Supplement to Metric Prediction Generation . Tensors are said to be of the same kind if they have the same num- ber of contravariant indices (superscripts) and covariant indices (subscripts) and are of the same weight. Contravariant tensors are those in which differential terms in the transform appear as ∂xnew/ ∂ x old ; covariant tensors are those in which this contribution is inversely related.

As points of information, the tangent vector along a spacetime curve at a given point in time is an absolute contravariant tensor of order 1, whereas the covector ©d representing the gradient of a scalar function is an absolute covariant tensor of order 1. The metric tensor dfg introduced earlier is an absolute covariant ten- f sor of rank 2. The Kronecker delta bg is a mixed absolute tensor having con- travariant and covariant ranks of 1.

In short, vectors are (1, 0) tensors; covectors (one forms) are (0, 1) tensors; the f metric dfg is a (0, 2) tensor; and bg is a (1, 1) tensor. Tensors may be combined to produce other tensors in a number of ways, some of which are enumerated below, without proof.

(Sum law) The sum of two tensors of the same kind is a tensor of this kind.

(Outer product law) The outer product of two tensors is a tensor whose weight is the sum of the weights of the two tensors, and whose valence is the sum of the valences of the two tensors.

(Contraction law) The contraction of a tensor having nonzero con- travariant and covariant ranks is a tensor. Contraction is the process of equating a contravariant index to a covariant index and summing over this index, as

fj j Qfgh; Q gh (8-175)

f f (Quotient law) If ^ _ gh is a tensor for all contravariant vectors ^ , then _gh is a tensor.

(Combination rule) The combination of a tensor of valence %o ) p & and with hÄ o contravariant vectors and jÄ p covariant vectors, the re- sult is a tensor of valence %o+ hp ) + j & . Mathematical Techniques 59

8.1.10.4. Tensors in Riemannian Space Perhaps the most significant application of is to the study of prob- lems a linear metric space. Tensors in this case deal with distances, angles, and motions governed by the metric tensor dfg of that space, as introduced earlier. One of the biggest jobs here learning is how the elements of the metric indicate the curvature of the space, and the contributions of this characteristic to dis- tances, angles, and motions.

In elementary vector analysis, in Euclidean space, it was taken for granted that vectors could be moved about freely. As long as there was no change the magni- tude or the direction of a vector, its basepoint could be moved to any arbitrary location and added to another vector whose endpoint was at this location.

This freedom of moving vectors about without consequences is no longer valid in the study of vectors on curved spaces. Vectors are defined at and confined to their basepoints. Basic operations on vectors involve only other vectors based at the same point.

To move a vector from one point to another, then, it is necessary to specify how this is to be done. A is a mathematical prescription for moving vec- tors based at one point of a space to infinitesimally nearby points, which, in the cases of interest here, does so using linear transformations. As discussed earlier in differential geometry, there are many paths by which vectors may be trans- ported from one point to another.

Parallel transport , or parallel translation , is an operation much like moving a tangent vector along a given curve from point ^ to point ^) a u without rotat- ing or stretching it along the way. It is an operation whereby a given tangent vector s at ^ is transformed (moved) into a tangent vector s at ^) a u that is

Linear: s is related to s by a linear transformation,

Compatible with the metric: if another vector t is transported to ^) a u in this fashion, producing t , then st); st ) .

Torsion-free: the torsion along the path, defined in much the same way as given in Eq.(8-133), is zero.

Fortunately, tensor theory has a theorem that says that, for a given metric d , there is a unique connection for parallel transfer, called the Levi-Civita connec- 60 Explanatory Supplement to Metric Prediction Generation tion . The definition of the connection, makes use of paths in the mani- fold and the so-called , which are discussed below.

8.1.10.5. and the A geodesic is a curve in Riemannian space whose tangent vector is parallel transported along itself. It is that curve of least length between two points. In spacetime, it is a completely unaccelerated path.

Let an arbitrary geodesic path in Riemannian space be given by uf; u f % q & , where q is some parameter common to each coordinate. Then the geodesics are known to satisfy the so-called Euler-Lagrange equations, given by ([Irving1959], page 366)

aªë c ë c º À + ; - aq Ωº "fÀ à f ëu ë u (8-176) ap c; duu"f " g ; fg aq for each coordinate, where here u"f; au f , aq . Specifically, u f and u"f are treated as independent variables insofar as partial derivatives are concerned, even though the latter is the t-derivative of the former.

Through insight and manipulation, Eq.(8-176) may be transformed into the fol- lowing system of geodesic differential equations

au/f auau / g / h ) Ef ; - (8-177) ap / gh ap ap f The coefficients Egh are called the Christoffel symbols of the second kind , named for , and are given by

fj ªëd ë d ë d f d º gj jh gh À E;gh º h )g + j À (8-178) / Ωº ëu ëu ëu À Ã

As indicated by their formulation above, they are straightforwardly computed from the metric tensor elements. Unfortunately, the calculations are usually tedious, lengthy, often complex, and require careful attention 7 to detail. After all,

7 Because of the effort required, the “offel” in the originator’s name is sometimes pronounced “aw- ful.” Mathematical Techniques 61 there are 18 of them in 3 dimensions, and 40 in spacetime. Fortunately, there are computerized mathematical tensor packages now available to ease much of this burden.

The geodesics can, in principle, then be generated as solutions of Eq.(8-177). If the space is flat, the geodesics turn out to be straight lines.

Under a change of coordinates, the geodesic differential equations should take the same form as Eq.(8-177),

au/f auau / g / h ) Ef ; - (8-179) ap / gh ap ap However, by applying a coordinate transformation to Eq.(8-177) and associating the results with Eq.(8-179), it is found that the relationships between the two sets of Christoffel symbols are related by

` ef/ q f f _ ëëëuuu ë u ë u E;Egh `e ) (8-180) ëu g ëuhëu_ ë u f ë u h ë u q This is the law of transformation of the Christoffel symbols , from which may be seen that Christoffel symbols are thus not tensors, because their form is not the same in all coordinate systems. But they are written in the same notation as ten- sors. They are linear transformations having both contravariant and covariant ranks.

Since the Christoffel symbols are not tensors, it turns out that it is possible, given that the symbols do not all vanish at a given point in one coordinate system, to find another coordinate system for which the transformed symbols do vanish at the image of the given point. In this new system, the geodesics of Eq.(8-177) are all (locally) straight lines. A Cartesian basis can be erected here, called a normal basis . This coordinate system is called a geodesic coordinate system .

The transformation that produces the geodesic coordinate system at a point mf may be shown to be given by

. uf;+)E% um ff & f %&% mumum gghh + &% + & (8-181) / gh 62 Explanatory Supplement to Metric Prediction Generation

8.1.10.6. The Covariant Derivative The usual derivative of a function is defined by subtracting two evaluations of the function at nearby points and dividing by the distance between the two points. However, this is not allowed in a curved space because two vectors are not al- lowed to be subtracted unless they have the same basepoint. Instead, a connec- tion can be used to parallel transport a vector to a nearby point; then, the subtrac- tion can be performed.

This generalization of differentiation involving parallel transport is known as covariant differentiation . Since it expresses the derivative along tangent vectors of a manifold, it is a covariant operation. It increases the covariant rank of the tensor it operates upon by one.

The ordinary derivative of a covector (an absolute covariant tensor) defined by the linear transformation

ëu g >; > (8-182) f g ëu f with respect to one of the transformed axes is, by the chain rule,

ë> ëëë >uuh j ë / u h f; h ) > (8-183) ëugëu j ëë uu fgh ëë uu gf

This form is not a tensor, because it has the extra second derivative terms. How- ever, the form

ë> >;f + > E h (8-184) fg) ëu g hfg can be shown to be a tensor. It is called the covariant derivative of >f with re- spect to u g . The comma (sometimes a semicolon) is used to denote covariant differentiation. The notation ©g is also used. In a Cartesian coordinate system, all the Christoffel symbols vanish, leaving the ordinary derivative.

The covariant derivative of an absolute contravariant vector >f is similarly de- fined, Mathematical Techniques 63

ë>f >f; ) > h E f (8-185) )gëu g hg The covariant derivative of a relative scalar > of weight K is defined as

ë > >; + K > E h (8-186) )gëu g gh The covariant derivative in each case is a tensor, so it is invariant under basis transformations.

This concept extends to tensors of higher rank by successive applications of the above three definitions. The formula for the general case is notationally very messy and not particularly enlightening. The interested reader will find the result in [Lass1950]. In essence, however, the formula begins with the partial deriva- ^ tive of the tensor. Then it adds a E_` term for each contravariant index, subtracts ^ a E_` term for each contravariant index, and subtracts a weight-multiplied con- ^ tracted E_^ term.

8.1.10.7. Directional Derivatives The of a tensor > intuitively represents the rate of change of the tensor at a given point m in the direction of a given tangent vector s . The computation, intuitively, is

>%m) e s & + > %& m ifj (8-187) eä- e For the reasons given above, the covariant derivative must be used to evaluate this value. The result, the directional derivative of A in the s direction and de- noted ©s> , turns out to be

f ©s> ; >)g s (8-188)

If > is a vector having valence %o ) p & , then both its covariant derivative >)g and the vector s are tensors, so their product is also one having the same valence as does > . Often, s is a unit vector, but the formula holds for any s , even the zero vector. 64 Explanatory Supplement to Metric Prediction Generation

8.1.10.8. Intrinsic Derivatives Let a curve in a be represented by uf % p & , where s is a pa- rameter such as the length along the curve. The intrinsic derivative of > is de- fined as the directional derivative of the tensor in the direction of the tangent vec- tor to the curve. It is also called the absolute derivative of >f .

b> au g ; > (8-189) bp)g ap

For example, the covariant derivative of a covector >f takes the form

b>faugë > f au gh au g ;>fg) ;g +E > hfg bp apëu ap ap (8-190) a> au ;f +> E h g aph fg ap

8.1.10.9. The Riemannian Tensor 8 f The Riemann tensor Oghj is a tensor of valence (1, 3) at each point in a Rie- mannian space. Thus, it transforms three given tangent vectors into a fourth tan- gent vector. It is defined so as to perform a parallel transport of a vector s around an infinitesimal path defined by the two vectors au and av to produce the vector s) a s . The difference as is a measure of the curvature at the base- point of the vector given by

f f ghj as; Oghj auaus (8-191)

The derivation of this tensor is tortuous and intimidating, even to the expert. The formula for it is shown below.

f f ffj f j Oghj;ëE h jg +ëE j hg )E hj E jg +E j j E hg (8-192)

f where ëf ;ë, ë u . Often the tensor appears in pure covariant form, obtained by lowering the index,

j Ofghj; d fj O ghj (8-193)

8 It is also referred to in various references as the Riemann-Christoffel tensor or as the Riemannian curvature tensor . Mathematical Techniques 65

This is a daunting formula that only its creator could have loved. Yet it depends only on the metric tensor elements dfg , no matter how tedious the computations that are required to produce it. After all, there are k1 components of the Rie- mann tensor in k dimensions, and each requires /k multiplications and addi- tions among the Christoffel symbols and their derivatives, each of which requires another 0k multiplications and additions.

However, all of the tensor components derive from a mere k% k ) .&, / inde- pendent values dfg . Symmetries reduce the number of independent Ofghj com- ponents to k/% k / + .&,./ in k dimensions, which amounts to only one inde- pendent tensor component in two dimensions, six in three dimensions, and 20 in four dimensions.

There are further simplifications that can be made, but, in the end, computing the Ofghj are still laborious, tedious, and error prone if done manually. The job should be relegated to computers.

8.1.10.10. The Ricci Tensor and Scalar The Ricci tenso r, named for Gregorio Ricci-Curbastro, the founder of tensor analysis, is defined as the contraction of the in the first and third indices,

h Ofg; O fhg (8-194) As a result of the symmetries inherent in the Riemann tensor, the Ricci tensor turns out to be symmetric in its indices, Ofg; O gf . In a neighborhood of any point m in a Riemannian manifold a set of local or- thonormal coordinates may be defined along the geodesics through m that emu- lates Euclidean space insofar as measuring distances and angles is concerned. In h f these coordinates, dfg; b fg , ëdfg , ë u ; - , and Egh ; - . In these coordi- nates a function akd , called the metric volume , measures the “volume” of a small differential of the space in the direction a vector au . A Taylor series of this function turns out to result in

ªOfg f g 0 kd ;º. +au au ) Lau % & À k Br`ifab^k (8-195) ºΩ3 À Ã 66 Explanatory Supplement to Metric Prediction Generation

That is, the if the Ricci tensor is positive at m , then the volumes of small spaces in the au direction will be smaller than are those in Euclidean spaces, and the opposite is true if the Ricci tensor is negative. The Ricci tensor, then, is a meas- ure of the orientation and magnitude of the contraction or expansion of the space, as compared to Euclidean space, in which there is none.

The Ricci scalar , also called the scalar curvature , is defined as the contracted Ricci tensor,

fg f O; dOfg ; O f (8-196) Since this measure is a scalar at each point of a manifold, there is no orientation, as there was with the Ricci tensor. The significance of the scalar curvature, then, is the expansion or contraction of the manifold at a given point. It is contracting at m if O is positive, and expanding if O is negative.

The Ricci tensor and scalar curvature appear significantly in Einstein’s theory general relativity. In fact, the is defined as

. D; O + Od (8-197) fg fg/ fg This tensor is discussed further in the Spacetime chapter of this Supplement.

8.2. Calculus of Finite Differences

Numerical analysis very often makes use of formulas involving differences of function values spaced over some type of grid. One such application is an inter- polation method based on central differences known as Everett’s formula , as dis- cussed in an earlier chapter on interpolation, where, although the MPG does not make direct use of the Everett interpolation formula, it nevertheless does make use of the Everett coefficient polynomials in its output prediction formats. Moreover, the use difference methods was often useful in the development of MPG algorithms, in instances where approximations of derivative values were necessary to validate performance, but in which no analytic derivative was avail- able. Mathematical Techniques 67

8.2.1. Difference Operators Let c% u & denote a function of the variable u and let bu denote a finite increment of u . (The symbol e is often used instead of bu .) Let the set of values cp ï cu%- ) pub & be calculated respectively at the locations up ï u- ) pub , where u- is some reference point, over some set of locations designated by s. The first-order central difference operation is defined as

bccupp;)% b u ,/& ++ cuu % p b ,/& ; c pp).,/ + c + .,/ (8-198) Higher order central difference operations are then calculated by application of this formula to lower-order central differences, which, by induction leads to the formula

k ªk bkc;% + .& o º À c (8-199) p∫ ºo À pko),/ + o ;- Ω Ã When the function being differenced is understood, the notation is often abbrevi- h h ated to bcp; b p .

Note that central differences of odd order require evaluations of the function and half-spacings of the increment. A method by which odd-order central differences only refer to integral spacings of the increment will be addressed a little later in this section.

The forward difference operation is defined in a similar fashion, but using only future samples, as

Bcp ; c p). + c p (8-200) To complete the set, the backward difference operation is defined using only past samples, as

©cp ; c p + c p +. (8-201) The higher-order differences for these operations then are known to satisfy 68 Explanatory Supplement to Metric Prediction Generation

k ªk k f º À Bcp ;% + .& º À c pkf) + ∫ Ωº f ÃÀ f;- (8-202) k ªk ©kc ;% + .& f º À c p∫ º f À p+ f f;- Ωº ÃÀ

8.2.2. Operator Calculus Each of the three symbols b , B , and © can be viewed as an operator, which, when applied to a function, performs the operation specified in Eqs.(8-199) or (8-202). The order, or exponent given to an operator signifies the number of times that operation is successively applied.

There are other operators that are used in combination with these to provide ex- tended computations. The shift operator E is defined to increase the argument by bu , as

h B cp; c p) h (8-203) and the averaging operator k is defined to compute the mean of adjacent sam- ples, as

k cp;% c p).,/ + c p + .,/ &,/ (8-204)

Notice that this latter operator, like b operates on half spacings of the increment.

Using the above definitions in operator form leads to the following expressions.

Bcp ;% B + .& c p .,/+ .,/ bcp;% B + B & c p +. (8-205) ©cp ;%. + B & c p B.,/) B + .,/ k c; c p/ p One further operator that is related to the above is the ordinary derivative opera- tor, A; a, au . In the usual Taylor expansion of a function,

Ç . %f & f cu%p)b u & ; ∫ cuu %& p b (8-206) f;- f  Mathematical Techniques 69 the operator notation may be invoked, to produce the formal relationship

Ç f %bu A & bu A cp). ;∫ cbcBc ppp ; ; (8-207) f;- f  Consequently, a relationship between the shift operator and the derivative opera- tor is established,

bu A B; b ; b R (8-208) It is convenient to represent the operator bu A by the single symbol R .

As seen in the above expressions, operators observe the same rules of combina- tion as arithmetic objects, prior to their final application to the function they modify. Products of operators act as the concatenation of operations, as k% k & Acp; c% u p & . Since all of these operators involve linear combinations of function values, they all commute with each other; that is, they may be applied in any order to produce the same result.

The operator dependencies on U are summarized below

B; b R B;B +. ; b R + . ©;+.B+. ;+ . b + R (8-209) k ;%B.,/ ) B+ .,/ &, / ; `lpe% R ,/& b ;%B.,/ + B+ .,/ & ; /pfke%,/& R

The utility of these formal expressions is that they functionally relate dif- ference operators to derivative operators. They may be inverted to relate derivatives to difference operations, as R ; / pfke+. %b , /& ;ild%. ) B & (8-210) ;+ild%. +© & ; / `lpe+. %k &

Applications of these formulas appear in the next subsections of this chapter. 70 Explanatory Supplement to Metric Prediction Generation

8.2.3. Approximation of Derivatives The three difference operator expressions in (8-209), when evaluated for very small bu , produce the approximations

k bk c;æ/pfke% b uA ,/& Õ c ñ b ucuk% k & %& p¿ø œŒ p p kbu A k kk% & B;cbp% + .& c p ñ b ucu % p & (8-211)

k+bu A k kk% & ©;+cp%. b & c p ñ b ucu % p &

That is, each n-th order difference operator is numerically L%b u k & . Taylor ex- pansions 9 of U using the expressions of (8-211) yield the following approxima- tions for the first derivative of c% u p & .

2 4 6 b0c 0bcp 2 b c p 02 b c p bu c~ ;+) b c p + ) ) L%b u ./ & p p /1 31- 4.35 /616./ B/c B 0 c B 1 c B 2 c ;B+cp ) p + p ) p ) L%b u 3 & (8-212) p / 0 1 2 ©/c © 0 c © 1 c © 2 c ;©)cp ) p ) p ) p ) L%b u 3 & p / 0 1 2 The superior convergence of central differences is well displayed in this com- parison. However, one incongruity in the direct use of central differences of odd order is that the samples they require are at half-spacings of the increment, whereas central differences of even order and forward and backward differences do not. This awkwardness is removed by the use of the average central differ- ence operator kb k when n is odd, which produces

k k k kbcp;% b c p).,/ ) b c p + .,/ &,/ (8-213) k+. k + . ;%bcp). + b c p + . &,/ However, the sample spacings are now twice as far apart as those appearing in forward and backward differences. The impact of this on derivative convergence is now assessed. First, the averaging operator may be expressed in terms of the central difference operator, as

9 The use of a tool, such as Mathematica , makes the generation of Taylor series a simple task. Mathematical Techniques 71

b/ k;`lpeXpfke+. % b ,/&Z ; . ) (8-214) 1 Then, the ratio R k , k can likewise be written as a function of the central differ- ence operator,

+. k R k X/ pfke %b , /&Z ; (8-215) k b/ . ) 1 Multiplication by k of the Taylor series of the above for any given n produces expressions for the n-th derivative in terms of kb k operations. For example, the series approximating the first derivative is

0 2 4 6 kbcp kb c p kb c p kb c p bu c~ ;+ kb c ) + ) ) L%b u .. & (8-216) p p 3 0- .1- 30- Convergence of this form is not as rapid as that given in (8-212), but is neverthe- less considerably faster than that exhibited by both forward and backward differ- ence operator expressions.

This derivative formula proved very useful in MPG algorithm development, where it was necessary at times to develop state vectors from given simulated position functions for validation purposes. It was also used to validate the con- sistency between position and velocity vectors extracted from input ephemerides when certain anomalous responses were found in MPG tests.

Approximations for the 2 nd through 5 th derivatives using central difference opera- tions appear below.

b1c b 3 c b 5 c bucc//~~ ; b +p ) p + p ) Lu% b .- & p p ./ 6- 23- 4 6 kb 2c 4kbcp 1. kb c p bucc0~~~ ;+) kb 0 p + ) Lu%b .. & p p 1 ./- 0-/1 5 .- (8-217) b3c 4bcp 1. b c p bucc1~~ ;+) b 1 p + ) Lu% b ./ & p p 3 /1- 423- 4 6 .. 5kbcp .0 kb c p .06 kb c p bucc2~~ ;+ kb 2 ) + ) Lu%b .0 & p p /1 .11 3-15 72 Explanatory Supplement to Metric Prediction Generation

Similar approximations may be made for backward and forward difference op- erators, but convergence is much slower.

8.2.4. The Everett Interpolation Formula Everett’s interpolation formula is used by the MPG to form prediction data types of various sorts, as is discussed in the chapter on Interpolation. However, as mentioned earlier, the MPG does not apply the formula to actual central differ- ences, but derives expressions for these that, when inserted into Everett’s for- mula, produce the polynomial that has the desired error characteristics.

The generalized Everett interpolation formula is discussed below. It is derived from the operational expression (i.e., the Taylor series)

oR cu%-)o b u & ; bc - (8-218) in which bu; u. + u - and -Äo Ä . . For any given o , evaluating the above gives the desired interpolated value. It may be verified by straightforward calcu- lation of the Taylor series that the fractional shift operation above can be written in the form

oR pfkeX%.+ o &R Z pfke% o R & b; ) b R (8-219) pfke%R & pfke% R & and therefore with o;. + o , the interpolation formula becomes

oR pfkeX%.+ o &R Z pfke% o R & bc; c ) c -pfke%R & - pfke% R & . (8-220)

;U%&oRc- )U %& o Rc .

It may be observed that U is an even function of U. From Eqs.(8-210) and (8-220), it follows that the Taylor expansion of the operator U%oR & in terms of b will have only even powers. In particular, this expansion is known 10 to take the form

Ç T /h U%&oR ; ∫ /h %& o b (8-221) h ;-

10 See [Irving1959] Appendix 4.2. Mathematical Techniques 73

The T/h %o & functions are degree %/h ) .& polynomials, called the Everett coef- ficients , that are equal to

ƒ¡ o clo„h ; - ƒ ƒ h T/h %o & ; ¬ o ƒ %o/+ f / & ƒ%/h ) .& ≠ clo„h < - √ƒ f;. (8-222) ªo ) h º À ; º À ºΩ/h ) . ÃÀ The Everett interpolation formula is thus given by

Ç T/h T / h cuu%-)ob & ;∫ %%& /-/.h ob c ) h %&& ob c (8-223) h ;- The particular Everett formula used by the MPG truncates the above at h ; / , which produces the polynomial of degree 5 in o appearing in the chapter on In- terpolation.

8.3. The Derivative Chain Rule and W-Variables

The MPG at times requires computing derivatives of composite functions. When these are of first order, the regular chain rule may be applied,

ac% r % q && ; c~% rq %&& rq ~ %& (8-224) aq This familiar formula combines the first derivatives of each function in the com- position. Finding the second derivative combines the use of the chain rule with the derivative product rule to give

a/ c% rq % && ;crq~~%%&&X rq ~ %&Z/ ) crqrq ~ %%&& ~~ %& (8-225) aq / which involves first and second derivatives of the two functions. By induction, the derivative of order n of a composite function can be evaluated using the set of derivatives c%k & % r & and r%k & % q & , h; .)? ) k . The method for doing this is the generalized chain rule, which will now be discussed. 74 Explanatory Supplement to Metric Prediction Generation

8.3.1.1. Generalized Chain Rule Leibnitz’s formula for the %h + .& -st derivative of a product is the series

h+. h+. ªh+. + o o aªh + . ºa r Àª a s %r s & ; º ÀºÀ º À (8-226) h+.∫ º o ÀºÀ h + . + o À ºΩaq o ÃÀ aqo ;- Ω à ºΩ aq À Ã

This is applied to the case sä c%j) .& % rr &) ä r ~ to produce

ach%& j%%&& rqah+. ac %& j %& r a h + . ; ; %r~ c%j ) .& %&& r aqh aq h+.aq aq h + . h+. ªh + . ah+ o r ao c% j ) .& % r & ; º À (8-227) ∫ º o À h+ o o o ;- Ωº ÃÀ aq aq h+. ªh + . ao c% j ) .& % r & ; º Àr%h+ o & ∫ º o À o o ;- Ωº ÃÀ aq

h Now let the operator Aj c represent the k-th derivative with respect to t of the m- th derivative of f( u ) with respect to u,

ah a hª a j Ach crq% j & && ' ' º cr À (8-228) j ;h ; h º j % & À aq aq Ωar à rä r% q &

. The chain rule expressed in Eq.(8-224), for example, is A- c . The deriva- tive in Eq.(8-227) can then be written as h+. ªh + . Ach; º À rAc% hoo+ & (8-229) j∫ º o À j ). o ;- Ωº ÃÀ The formula for the n-th composite derivative is therefore

k+. ak c& r& q ' ' ªk + . ;Ack ; º À rAc% koo+ & (8-230) k -∫ º o À . aq o ;- Ωº ÃÀ h Eq.(8-229) is a recursion formula for Aj c whose terms are derivatives of r g multiplied by Af c values with g: h and f< j . Eventually, all recursions - %m & come to the point of evaluating terms of the form Acm ; c% r & , which are pre- sumably known. Mathematical Techniques 75

8.3.1.2. W-Variables The composite derivative in Eq.(8-230) is therefore a combination of tabulated values of c%h & % r & and r%h & % q & . This formula and a simple algorithm for comput- ing it were devised in 1964 by R. E. Wengert [Wengert1964]. Subsequent au- thors [Lawson1971], [Griewank1991] incorporated his approach into more effi- cient systems, but the principles on which they operate remain essentially the same. The method is based on the use of W-variables , where the “W” refers to Wengert.

A W-variable is merely an %k ) .& -tuple containing function and derivative val- ues up to order k with respect to the natural function variable, as

æc% r & Õ æc Õ ær& q ' Õ ær Õ ø Œ ø Œ ø Œ ø Œ ø~ Œ ø~ Œ ør~& q ' Œ ør~ Œ øc% r & Œ øc Œ ø Œ ø Œ c%r & ;;ø Œ ø Œ r & q ' ;;ø Œ ø Œ (8-231) ø@ Œ ø@ Œ ø@ Œ ø@ Œ ø Œ ø Œ ø Œ ø Œ øk Œ ø%k & Œ ø%k & Œ ø%k & Œ c% & % r & c r& q ' r ¿ø œŒ ø¿ Œœ ø¿ Œœ ¿ø œŒ As an example, the order-5 W-variable calculated for the square root function is

. æ/ Õ ør Œ ø Œ +. ø. r / Œ ø/ Œ ø Œ +0 ø+ . r / Œ ø1 Œ pnoq %r & ; ø Œ (8-232) +2 ø0 r / Œ ø5 Œ ø+4 Œ ø+ .2 r / Œ ø.3 Œ ø+6 Œ ø.-2 r / Œ ¿ø0/ œŒ

8.3.1.3. Chain Rule Computation The recursive computation in Eq.(8-230) above provides the general method for calculating W-variables of composite functions. It is a little slower than a direct approach would be, where derivatives are computed using the fully expanded form of Eq. (8-230). For example, if one were to evaluate each of the first 5 de- rivatives of the composite function using the chain rule, it would be found that the results are all linear in the elements of c%r & . Isolating the coefficients of c %f & 76 Explanatory Supplement to Metric Prediction Generation for the j-th derivative produces the ij -th element of the matrix that transforms c%r & into the W-variable of the derivative with respect to t. The chaining trans- formation is

ª .- - --- º À º -r ~ - --- À º À º / À º - r~~ r ~ - - - À º À c%%&&r q ; º À c %r & º ~~~ ~ ~~ ~ 0 À (8-233) º -r 0 rr r - - À º À &' À º 1 / / 1 À º -01r rrr~~) ~ ~~~ 3 rr ~ ~~ r ~ - À º À

&' À º 2 %1& / / 02 À ºΩ -r .- rr~~ ~~~) 2 rr ~ .2 rr ~ ~~ ) .- rr ~ ~~~ .- rrr ~ ~~ ~ ÃÀ º À This direct approach is perhaps more easily understood and computationally less demanding than the recursive method, but does require a larger program to com- pute. Also, the form in Eq. (8-233) is only applicable to n = 5 , whereas, the re- cursive formula works for any n.

This direct method is not used in the MPG, but it was used in the NSS MP. It is presented here for completeness and reference.

8.3.1.4. W-Variable Math Functions The MPG uses JPL’s Math 77 (in Fortran) and Math C90 (in C) suites of mathe- matical subroutines, which have double precision functions ( DWCHN ) that per- form the operations of Eq. (8-230) above automatically and efficiently given the W-variables. They also have others that provide, for many basic mathematical operations, algorithms that not only perform the operation, but also propagate derivative values up to the desired order. These compute the W-variables of functions such as square root, exponentiation, logarithm, trigonometric and in- verse trigonometric functions, and series reversion. Still others compute the W- variables of certain two-function combinations, such as addition, subtraction, multiplication, division, and arctangent.

Details may be found in JPL's Math-C90 Library (see References, below). The source code is contained in the file dwcomp.c , which contains 26 routines for assigning values, doing arithmetic operations, and performing ancillary func- tions.

The library function DWSET creates a simple W-variable from a value and first derivative of the form Mathematical Techniques 77

æu% q & Õ ø- Œ ø Œ øu~% q - & Œ ø Œ ø- Œ (8-234) ø Œ ø Œ ø@ Œ ø Œ - ¿ø œŒ Q Q Thus the W-variable X^ )-)? )-Z is that of a constant, and Xq- ).)-)? ) -Z is that of the variable q evaluated at q- . As other examples, DWSQRT generates the square-root W-variable shown in Eq.(8-232) when given the W-variable Xr ).)-)? ) -Z Q , and produces the compos- ite square-root W-variable when given the W-variable of r% q & ; DWPRO computes the W-variable of the product cq% & d& q '; and DWATN2 calculates the W- variable of the 2-argument inverse tangent function.

The set of routines described above is also implemented in a Mathematica docu- ment (see [Tausworthe200] in References, below) in order to provide W-variable computations in MPG algorithm development.

8.3.1.5. Reverse Chaining Reverse chaining is the computation that reverses the computation of composite derivatives. In short, chaining the W-variable c of the function c% u & with the W-variable u of the function u% q & produces the W-variable v of the function c% u % q && . Reverse chaining the W-variables v and u reproduces the W-variable c , within computational errors, provided u~% q - &î - . The Math-C90 library function DWRCHN implements this transformation.

8.3.1.6. Self Chaining The Mathematica work [Tausworthe2006] also contains a function WVSelf- Chain that is not found in the Math C-90 library. Self-chaining applies to func- tions c% q & whose first derivate is a known function of c% q & , i.e., cq~%&; dcq %%&& . For such functions, the first few derivatives are 78 Explanatory Supplement to Metric Prediction Generation

c% q- & ; c -

cq~%&-; dc %& - ; d - cq~~ dcq ~ dccq ~ ~ dd %&-; %%&& - ; %&%& --.- ; (8-235) / cq~~~%&-; dcq ~~ %%&&%& -- cq ~ ) dcq ~ %%&& -- cq ~~ %& / / ;dd/- ) dd .- ?

The WVSelfChain function computes these values given c- and the W- %f & Variable containing the df ; d% c - & ,

æd- Õ ø Œ ød. Œ d ; ø Œ (8-236) ø@ Œ ø Œ ød Œ ¿øa+. œŒ This function comes in handy, for example, for generating any number of deriva- tives of conic trajectories, where the orbital position in Cartesian coordinates, as discussed in a another chapter of this Supplement, is given by

æ`lp B Õ ø Œ ø Œ o ;^ø. + b/ pfk B Œ (8-237) ø Œ ø- Œ ¿ø œŒ and the derivative B" of the eccentric anomaly B is given by k B" ; (8-238) .+ b `lp B

The process for generating the W-variable for B , in outline, is as follows. Let Q du% &; k ,%. + b `lp u & . Submit the W-variable XB- ).)-)? ) -Z to DWCOS . Use DWPRO1 to multiply this result by the eccentricity b . Similarly apply DWDIF1 and DWQUO1 to complete the computation of the W-variable of d% u & evaluated at

B- . Then the W-variable for B at the point B- is given by WVSelfChain .. Mathematical Techniques 79

8.3.1.7. Series Reversion

Let u denote the W-variable of the function u% q & containing u-; u% q - & and the %f & derivatives u% q - & with respect to q up to order k + . evaluated at q; q - . Now suppose that it is desired to have the W-variable q that corresponds to the %f & function q% u & , which contains q- and the derivatives q% u - & with respect to u up to order k + . , evaluated at u- . When the components of these arrays are interpreted as the scaled coefficients of Taylor series, the transformation from u to q is known as series reversion .

The reverse chaining function DWRCHN can be used to produce this transforma- tion. When the x-variable argument is the W-variable u of the given function and the f-variable argument is the W-variable of a simple variable (i.e., of the Q form Xq- ).)-)? ) -Z ), then the reverse chain function performs Taylor series re- version, which provides the W-variable of the inverse function. It is, to be con- crete, equal to

æq- Õ ø Œ ø. Œ ø Œ øu~% q & Œ ø- Œ øu~~ % q & Œ ø+ - Œ q ; øu~ q 0 Œ (8-239) ø%- & Œ ø/ Œ ø0%&uq~~-+ uquq ~ %& - ~~~ %& - Œ ø Œ øu~% q & 2 Œ ø- Œ ø@ Œ ¿ø œŒ But of course, DWRCHN has done all the work to produce this result. The astute reader may recognize (after perhaps consulting a calculus text) the derivative terms as those of the inverse function written in terms of the function derivatives.

The Taylor series of the inverse function can then be reconstituted, as

u+ u - u~~ % q - & / quq%&;)- +0 %& uu + - u~% q - & /u~ % q - & (8-240) 0%&uq~~/ + uquq ~ %& ~~~ %& - - - 0 )2 %u + u - & ) ? 0u~ % q - & 80 Explanatory Supplement to Metric Prediction Generation

8.4. Finding Roots of Nonlinear Functions

One of the most pervasive needs within MPG algorithms is that of finding the root of a given function. For example, all view period events are expressed as zero crossings of various metrics. To further the example, the time at which the elevation of a spacecraft, as viewed from a Deep Space Station, reaches its maximum value is found by locating that point at which the derivative of the ele- vation function is zero.

Because the MPG must find the roots of so many differently varying functions, no one root finding method was found that works for all applications. Even in a given case, such as finding the time of maximum elevation as described above, the widely varying characteristics of targets over the set of DSN missions makes the robustness of the method problematic.

Yet the MPG is required to operate in unattended, non-interactive mode for each of its many applications; it is required to work for all targets submitted to it; it is required to produce all predictions requested of it; it is required to predict an event when there will be one, and not to report one when none will occur; and it is required to produce predictions within specified accuracy.

The robustness and accuracy requirements that are placed on the MPG thus make stringent demands on the methods used. Since robustness is considered more important than an algorithm’s complexity and time consumption, brute force methods may be applied when necessary, provided there is enough computational power to support these methods. After all, predictions have to be produced in a timely manner.

8.4.1. Scope of Treatment The remainder of this section treats the mathematical aspects of solving an equa- tion numerically. It discusses various methods and tools that the MPG uses, has considered using, or may use in future upgrades. It stresses the use of existing methods and numerical library functions, when these exist and satisfy MPG needs.

While the general form of an equation expresses equivalence between left-hand and right-hand expressions, the traditional treatment of the subject subtracts the two sides, leaving an expression of the form c% u &; - . Values of the independ- Mathematical Techniques 81 ent variable u satisfying this condition are roots , or solutions . Generally speak- ing, there may be no solutions, a unique solution, multiple solutions, or a contin- uum of solutions to any given (nonlinear) equation.

The domain of the function c% u & is the set of x over which it is valid to take val- ues. The image of the domain under the mapping c% u & is called the range of the mapping. If there are k independent variables in the function c% u & , the domain of c% u & is said to be k dimensional .

If the equation c% u & ; - itself is multidimensional (e.g., a vector equation), then the solution must simultaneously satisfy each dimension. This compounds the problem because, while solutions may be found in each individual dimension, finding one that is a solution in each dimension may be time consuming.

The problem of solving c%)& u v ; - is one of finding implicit solutions . That is, for each given v , the problem reduces to one of the former type, cv % u &; - , and the same is true when a value of u is given and a suitable v is sought. In either case, numeric methods may be applied to find the roots, when they exist. Root finder methods can thus be used to invert a given implicit equation to find a func- tion vc % u & such that cuv% )c % u &&; - . Some implicit functions do appear in MPG applications.

Current MPG applications do not require multidimensional root finding, so the discussion here addresses the simple form c% u &; - .

If a function is supplied, about which no knowledge is allowed, as if values ema- nate from a black box when fed input values, then there is no way, in practicality, of finding all of its roots in a given interval, if, indeed, it has any at all. A search may or may not produce indications of regions where roots may lie, but cannot generally guarantee that any or all will be found. However, if some knowledge exists about the function, that knowledge may assist in their location.

In finding roots, it is necessary to have practical criteria for accuracy. Accuracy criteria typically include the characterization of error, which is to be driven to zero, or within some tolerable region of it. Accuracy criteria may include some measure Jcf of how far the function value is away from its goal of zero (absolute or relative distance), or may include a measure of the motion Juf between suc- cessive root estimates. Which does one choose? 82 Explanatory Supplement to Metric Prediction Generation

The usual advice in textbooks given to one seeking to solve an equation is to ana- lyze the function carefully to gain insight, and then to use this insight to choose the right methods to apply. Unfortunately, this advice is not totally practical in MPG applications. The general character of a metric function in an application may be known, but the particulars may vary widely over a considerable dynamic range.

When solving for the time of maximum target elevation, as mentioned earlier, the general function is the same from target to target, but the function dynamics when the target is a low Earth orbiter are very different from those when the tar- get is a spacecraft in deep space.

For such reasons, it will not be possible here to discuss special MPG methods that have been developed ad hoc to cope with application dynamics. Instead, only generic methods and tools that can be applied in such applications will be treated.

8.4.2. Basic Approaches Except in linear problems, root finding invariably requires an iterative approach, and this is especially true in a multidimensional problem. There are two basic strategies to root finding, here called bounding and polishing . Efficient root finder algorithms typically combine both approaches.

Root bounding first determines an interval within at least one root assuredly ex- ists, and then successively narrows the interval, still maintaining at least one root within the interval, until an estimated root value satisfies convergence criteria. Bracketing and bisection, discussed later, are classic tools in root bounding.

Root polishing begins with a first estimate where the root might lie, and itera- tively seeks successive improved estimates until convergence criteria are met. The Newton-Raphson and Secant methods, discussed later, are classic root pol- ishers. The method of determining the first estimate is typically a heuristic based on some knowledge of the problem.

Both approaches are characterized by three elements: (1) determination of initial starting conditions, (2) checking the state of convergence, and (3) computing an improved next estimate. All three are influenced by the particular function being solved. Mathematical Techniques 83

But, generally speaking, once it is known that a given interval contains a root, or set of roots, there are classical methods that can be brought to bear to locate it or them. They are characterized by varying degrees of speed, complexity, and ro- bustness. Those that guarantee success are usually the slowest, while those that are the fastest may be prone to crash, unless precautionary measures are taken.

In any iterative process let the size of the “error,” however measured, at the k-th step be denoted by ch . If there exist constants m and -:` : . such that ªc º h ). À ifj º m À ; ` (8-241) h äÇ ºΩch à then m is the order of convergence . If m is unity, the process is said to converge linearly; if it is 2, convergence is quadratic; and if it lies in between, convergence is called superlinear .

But if it takes j function samples to compute one iteration, what is the equiva- lent order of convergence n per sample? In this case, one presumes that there is a virtual error per sample implied by ch; b jh , for which the order of conver- gence is n ,

bh ). n ; j (8-242) bh Iteration of Eq.(8-242) j times produces

ch ). .)n )? ) n j+. m ;j ; ` ch m; n j (8-243) n; m ., j Therefore, an algorithm requiring j function samples per iteration and which yields an order of convergence of m per iteration will have an equivalent conver- gence rate per sample of m., j .

8.4.3. Bracketing Methods One means of certainty in root finding is bracketing of the root. If two trial val- ues u. and u/ have produced c% u f & values of opposite sign, and if the function 84 Explanatory Supplement to Metric Prediction Generation is continuous between the two, then a root is certain to exist in the interval be- tween the trials, and a number of methods can be used to find it.

A root is said to be bracketed in the interval %^ ) _ & if the function is continuous in this interval and c%&%& ^ c _ : - . The intermediate value theorem then guarantees, because c% ^ & and c% _ & have different signs, that the function value zero must be assumed at some value of the variable within the interval. There exists at least one algorithm, bisection, discussed next, that always finds a bracketed root.

But root bounding may encompass more than one root. The bounding interval may, in fact, bracket any odd number of roots. Finding one of them generally unbrackets the remaining even number of roots. Further, a root of even multi- plicity cannot even be bracketed.

Generally speaking, the function should be continuous within the bracketed in- terval for good performance. However, bracketing methods will converge on an answer event if there are discontinuities within the interval. If a discontinuity appears to cross the zero axis, it may mistaken for a root.

8.4.3.1. The Bisection Root Finder The bisection root finder is a method that cannot fail when supplied the bracket- ing interval of a root or set of roots. It will assuredly find one and only one root in the interval. The process is simple: Evaluate the function at the interval mid- point and examine its sign. Replace the interval endpoint having the same sign. Repeat this procedure until the width of the remaining interval is less than the required accuracy.

The maximum number of steps required in this procedure can be predetermined from the interval size. If the given interval is %^ ) _ & and the required accuracy is c , then because the interval is divided in two at each step, it will end after k steps for which

_+ ^ c : (8-244) /k The number of steps is thus

æ_+ ^ Õ k ; øild / Œ (8-245) ø& c ' Œ Mathematical Techniques 85 where æ Õ indicates the ceiling function. Bisection convergence is linear.

Note that the bisection root finder does not actually use the function values it computes, only their signs. One intuitively feels that the process could be accel- erated using all the information.

8.4.3.2. Regula Falsi, or False Position Method The regula falsi root finder, or in English called the false position method, is merely inverse linear interpolation at each successive root-improvement step.

During iteration, if %u- ) c - & and %u. ) c . & are the latest two sample variable-value pairs , then the line through these points is

c.+ c - vc;- )% uu + - & (8-246) u.+ u -

The estimated root u/ is the solution to the equation v ; - , which is

u.+ u - cu .- + cu -. u/; u . + c . ; (8-247) cc.-+ cc .- + The first form requires 3 subtractions, one multiplication, and one division, whereas the second form requires 2 subtractions, 2 multiplications, and one divi- sion. Since multiplications generally take longer than subtractions, the first form is preferred.

False position begins with a bracket interval %^ ) _ & and evaluates the function at these two points for the first set of variable-value pairs. At each iteration, com- pute u/ as above and c/; c% u / & , and proceed with %u/ ) c / & and whichever of %u- ) c - & and %u. ) c . & has opposite sign of c/ , in order to maintain the bracketing of the root. Because it maintains the bracket, convergence is guaranteed.

Since it sometimes keeps an older point rather than the more recent one, its con- vergence may occasionally be slow. But since the newer sample will occasion- ally be kept, the rate may also be faster than bisection. The order of convergence is not known in general.

The convergence criteria should not be based on the width of the remaining bracket interval. For, unlike bisection, where the size of the remaining bracket continually shrinks, here it may not. For some functions, one bracket endpoint remains fixed after a certain number of iterations, while the other creeps toward 86 Explanatory Supplement to Metric Prediction Generation the root. When this happens, the interval size approaches a constant value other than zero.

A better criterion would track the motion of successive u/ values, which does tend to zero.

8.4.3.3. Ridders Method Since bisection and regula falsi always determine a bracketed root, why wish for more? After all, they are simple, reliable, and robust algorithms. The answer is that it they converge only linearly. The hope of a faster algorithm, without giv- ing up the guaranteed performance, has brought about a number of hybrid algo- rithms that maintain the bracket while polishing the root estimate.

In 1979, C.J. F. Ridders published “A new algorithm for computing a single root of a real continuous function” in an IEEE transactions journal, where it lay ob- scured to mathematicians, it seems, for a decade. The method is a powerful vari- ant on the false position algorithm that is robust, reliable, speedy, and competi- tive with the more highly developed, better established, and more complicated method of van Wijngaarden, Dekker, and Brent, discussed later here.

The discussion here is a somewhat simplified version of the argument given in the Ridders article. Let a root of c% u &; - be bracketed by Xu- ) u . Z . If the root lies at either endpoint, there is no need to continue, so it may be assumed that neither endpoint is a root. Then let N denote a positive numeric factor, to be / determined, such that the midpoint of the line joining %u- ) c - & and %u. ) N c . & passes through the point %u/ ) N c / & , where u/;% u - ) u . & and c/; c% u / & . That is,

c) N/ c - . ; N c (8-248) / / This is a quadratic equation in N whose solutions are

cç O N ; / c. (8-249) / O; c/ + cc - . Mathematical Techniques 87

The reader will note that O is always a real number, because c- and c. have opposite signs, and thus Oé c / . In order to ensure that N < - , the sign in the numerator must match the sign of the denominator. The solution is therefore

c/)pdk%& cOc . / + pdk% cO - & N ; ; (8-250) c. c . Next, the intersection of the line passing through the three points above with the x-axis is located, and found to be

%u/+ u -/ &%,& cc - u; u ) (8-251) 0 / / %,&cc/-+ %,& cc .- Notice that, by construction, the intersection always lies within the bracket. It is merely necessary to complete the false position bookkeeping to maintain the bracket for the next iteration: Compute c0; c% u 0 & , and proceed with %u0 ) c 0 & and whichever of %u/ ) c / & , %u- ) c - & and %u. ) c . & has function value with the oppo- site sign of c0 , with preference given to %u/ ) c / & when it fits the criterion.

If o denotes the converged root, then each of the uf; o ) b f , where bf is the distance of the point uf from the root. A Taylor series of Eq.(8-251) about o gives the result that the distance from the root in the estimate is

ªc~~ % o & / º À b0; bbb -.-% ) b . & º À ) ? (8-252) ºΩ1c~ % o & À Ã

If the first derivative above does not vanish and eu;. + u - ; b - ) b . is the interval length in this iteration, then this first term is bounded by

/ ªc~~ o 0 º % & À b0 Ä e º À (8-253) Ωº 5c~ % o & À Ã

Since the interval size is at least twice the root distance and the derivatives re- main constant, the convergence relationship becomes approximately

b0)h ) . 0 ñ j (8-254) b0) h 88 Explanatory Supplement to Metric Prediction Generation

Convergence between iterations is thus cubic, but, since two samples per itera- tion are required, the convergence order per sample is halved, m ;0 ; .+40/ , still a respectable rate.

A coded version of the algorithm named zriddr appears in [Press1886], which uses a slightly different, and slightly slower, iterant than Ridder’s original for- mula given in Eq.(8-251). Other implementations may be found on the internet.

8.4.3.4. Brent Method Adriaan van Wijngaarden developed a bracketing algorithm in the 1960s, which was then refined by Theodorus Dekker in 1969. Dekker combined bisection and secant method steps, choosing those that gave most rapid convergence while maintaining the bracketed root. Richard Brent in 1973 improved Dekker’s algo- rithm by applying inverse quadratic interpolation to supplant the secant steps. The algorithm, variously called the van Wijngaarden-Dekker-Brent root finder, or just the Brent root finder has become perhaps the most popular and well known method for determination of a bracketed root.

Brent’s first enhancement of the Dekker method was one that improved the deci- sion of which step type, bisection or secant method, is to be used in that iteration. This improved convergence considerably. He then introduced inverse quadratic interpolation to supplant the secant step, when this type of step appeared better, still further improving the convergence rate.

Some versions of the Brent root finder found in the literature no longer include the secant method step, such as ZBRENT appearing in [Press1986] (see Refer- ences, below).

The method has the following parts: (1) bisection steps, (2) inverse quadratic in- terpolation steps (and perhaps secant method steps), (3) determination of which type of step to take, and (4) bookkeeping to maintain bracket and keep the itera- tion using the same variables at each iteration.

Inverse quadratic interpolation operates on three variable-value pairs, %^ ) c ^ & , %_ ) c _ & , and %` ) c ` & , in which function sample values are cu ; c% u & . However, these are applied to the Lagrange formula (discussed in an earlier chapter) so as to make function values v; c% u & assume the variable role and u the value role; that is, the Lagrange formula is applied to %c^ ) ^ & , %c_ ) _ & , and %c` ) ` & , with the result Mathematical Techniques 89

%v++ cv_` &% c^ & % v ++ cv ^` &% c_ & u ; ) %cccc+ &% + &% cccc + &% + & ^_^` _^_` (8-255) %v+ c^ &% v + c` _ & ) %c`+ cc ^` &% + c _ & Setting v ; - then provides the next root estimate. Computation of this esti- mate is simplified by defining c O ; _ c` c P;_ mPQOQ`_ ;%% +++++ &% &%. O_^ &% && (8-256) c^ c Q;^ nOPQ ;+++% .&% .&% .& c` for then the next root estimate becomes m u; _ ; _ ) (8-257) h). h n The algorithm proceeds with the current best estimate of the root labeled as _ , so the term m, n is a correction, preferably a small one.

Dekker’s algorithm maintains interval endpoints as ^h and _h , where _h is the current best estimate and ^h is the “contrapoint,” or point such that c% ^ h & and c% _ h & have opposite signs, and also chosen to make _h a better estimate than ^h , i.e., c_& h' Ä c^& h ' . The previous estimate, _h+. , is available to compute the secant-method estimate, with _+.; ^ - in the first iteration. Dekker then computes two candidates, the secant-method estimate p and bisection-method estimate j . It then chooses _h ). ; p whenever p lies between _h and j , or chooses _h ). ; j otherwise.

Then the value of the new contrapoint is chosen. If c% ^ h & and c% _ h ). & have op- posite signs, then the contrapoint remains the same, ^h). ; ^ h . Otherwise, c% _ h ). & and c% _ h & have opposite signs, so the new contrapoint becomes ^h). ; _ h . Finally, if

Then, the value of the new contrapoint is chosen such that f(ak+1 ) and f(bk+1 ) have opposite signs. If f(ak) and f(bk+1 ) have opposite signs, then the contrapoint remains the same: ak+1 = ak. Otherwise, f(bk+1 ) and f(bk) have opposite signs, so 90 Explanatory Supplement to Metric Prediction Generation

the new contrapoint becomes ak+1 = bk. Finally, if c^%h). &: c_ %& h ) . , then ^h ). is likely closer to the root than is _h ). , so the two values are exchanged. Brent recognized that there are functions for which the secant method is used at each step, yet converged very slowly, more slowly than bisection, in fact. He proposed that an additional criterion be satisfied before the secant step be ac- cepted as the next iterant. Specifically, the secant-method estimate is accepted when

¡. ƒ _+ _ mobsflrp„pqbm„t^p„_fpb`qflk ƒ/ h h +. p+ _ h : ¬ (8-258) ƒ. ƒ _+ _ mobsflrp„pqbm„t^p„fkqbomli^qflk ƒ√/ h+. h + / This modification assures that bisection is used if the interpolation is not con- verging rapidly enough.

Brent further introduced inverse quadratic interpolation whenever c% _ h & , c% ^ h & and c% _ h+. & are distinct, and changing the acceptance criterion to the condition that the candidate value p , when it can be computed by inverse quadratic or lin- ear interpolation, must lie between %0^h) _ h &, 1 and _h . The Brent algorithm convergence is superlinear, and approaches the convergence order m ; .+506 toward the end of iteration, when few bisection steps are made.

8.4.4. Root Polishing Methods Root polishing methods are required when bracketing is not possible or practical, and when there is reasonable certainty that a root exists in the vicinity of a start- ing point for the search. They all require a well behaved function and iteration path between the starting point and the root.

8.4.4.1. Secant Method The secant method is practically the same as regula falsi. The only difference is that the two newest variable-value pairs are always retained. The method is not a bracketing one, and so it may fail to converge on a root. However, when it does converge, it is usually of superlinear order given by the golden mean, m ;%. ) 2&,/ ; .+3.5-006554 ?. Mathematical Techniques 91

8.4.4.2. Newton-Raphson Method Perhaps the most celebrated of all one-dimensional root finding schemes is the Newton method (also called the Newton-Raphson and Newton-Fourier method). It is an efficient algorithm for finding roots of differentiable functions. It does not bracket roots, and may therefore fail to converge. But when it does converge, convergence is generally quadratic.

The method requires the evaluation of both function and derivative at arbitrary trial points. It begins with a single estimate of the root location, whereupon suc- cessive refinements are made using the inverse two-term Taylor approximation

cu%&ñ cu %&h ) cuu~ %&% h +ñ u h & - (8-259) whose solution is

c% u h & uh). ; u h + (8-260) c~% u h & Note that if the function has a zero derivative at the root, such as when there is a multiple root at this point, the computation in Eq.(8-260) may be tricky.

Further, like the Taylor series upon which it is based, the approximation is valid only when near to the root. Far from the root, the higher order terms in the series are important, and the iteration in Eq.(8-260) can lead to grossly inaccurate, meaningless corrections.

Even when the Newton method is not used in the early stages of root finding be- cause of its possible poor convergence properties, it may be very wise to use it toward the end of the convergence to polish the root, just because of its rapid convergence when near the root.

The Newton formula can also be applied using a numerical difference to ap- proximate the derivative. However, because two samplings of the function are required per iterative cycle, the order of convergence drops to only / , so the secant method prevails in performance.

Newton’s method readily generalizes to multiple dimensions, and the interested reader may consult a textbook on numerical analysis for how this is done. 92 Explanatory Supplement to Metric Prediction Generation

8.4.4.3. Acceleration of Sequences Some methods of root polishing, including those within bracketed root finders, converge relatively slowly. Some of the simplest ways of improving a slowly convergent series apply an accelerator to the series of iterants.

One of the simplest of these was introduced by Alexander Aitken in 1926, but it appears that the method had been known and used by the Japanese mathemati- cian Takakazu Seki in the seventeenth century, who used it to approximate n to 10 decimal places.

Aitken’s method is as follows: Let xm- ) m . )? z be a sequence that is converging linearly toward a value mÉ , such that all of the ch;m h + mÉ for h< h - have the same sign. Then develop the series

/ / %mh). + m h & %B h & mh;+ m h ;+ m h / (8-261) mh)/+/ m h ) . ) m h Bh f where Bh is the forward difference operator discussed earlier in this chapter. This transformed series converges more rapidly than the original, sometimes dramatically (sometimes not).

Aitken’s method may be reapplied to mh to yield mh for yet another improve- ment, and so on. In practice, however, this added improvement is usually only slight, compared to the first application.

The Steffensen method is the application of Aitken’s method to a series obtained using Newton-Raphson iteration.

For a series in which the ch alternate in sign, there is a method, called the Euler Transformation , that accelerates convergence. However, none of these methods seem to work when the signs of the ch terms are randomly oriented in the series. And since the MPG cannot suppose that the function whose root is sought will have ch signs in a particular orientation, acceleration methods are not generally applied.

8.4.4.4. Taylor Series Reversion Root Finder In mathematics, the Lagrange inversion theorem , also known as the Lagrange- Bürmann formula , gives the Taylor series expansion of the inverse function of a given analytic function. If the function v; c% u & to be inverted is given in the Mathematical Techniques 93

form of a Taylor series about the point u- for which c% u- & ; v - and c~% u - &î - , then series reversion is the computation of the coefficients of the inverse series from the coefficients of the function series.

Thus, given a W-variable c containing the evaluation of c% u & and a number of Q its derivatives at some value u- , and the simple W-variable u ; Xu- ).)-)? ) -Z , the use of the function WVReverseChain found in [Tausworthe2006] or DWRCHN in the Math C90 library [Lawson1994] produces the W-variable of the reversion, which is

æu- Õ ø Œ ø.,c~ % u & Œ ø- Œ ø0 Œ c +. ; ø+cu~~%- &, cu ~ % - & Œ (8-262) ø Œ ø/ %0& 2 Œ ø%0%&cu~~-+ cucu ~ %& - %&&,%& - cu ~ - Œ ø Œ ø@ Œ ¿ø œŒ The astute reader will recognize (perhaps after consulting a calculus text) the de- rivative terms as those of the inverse function written in terms of the function derivatives.

The Taylor series of the inverse can now be written as

cu%&+ cu %&- cu~~ %& - / uu;)- +0 & cucu%& + %- & ' c~% u - & c~% u - & (8-263) 0%&cu~~/+ cuc ~ %& %0& %& u - - - 0 )2 &cu%& + cu %&- ' ) ? c~% u - & The value of u desired is, of course, that for which the function value vanishes. The reversion estimate of the root is, therefore,

cu%&- cu~~ %& - / uuÉ ;- + + 0 cu% - & c~% u - & c~% u - & (8-264) 0%&cu~~/+ cuc ~ %& %0& %& u - - - 0 +2 c% u - & ) ? c~% u - & The reader will note in the above expression that the first two terms constitute the Newton-Raphson approximation of the next step. The remaining terms are thus 94 Explanatory Supplement to Metric Prediction Generation higher-order corrections that produce convergence of order k , the order of the W-variable. However, in order to produce the W-variable of order k , it is neces- sary to sample the function at k points near u- , so the convergence order per sample is k., k , which is maximum at k ; 0 , which produces a convergence order of m ;0., 0 ; .+11// .

However, if the k points used are taken from the sequence of trial points gener- ated during the iteration process, these may be used to generate the W-variable, as described earlier in this chapter. When this is done, the convergence rate in- creases at each sample.

The MPG implementation does not currently, in its first operational version, use reversion root finding. Later upgrades may wish to reevaluate whether certain algorithms would find advantage in this root finder.

8.4.5. Finding Multiple Roots in an Interval The discussion, to this point, has concentrated on finding a single root from given initial conditions. However, MPG applications quite frequently have multiple roots in an interval of interest. For example, a spacecraft in a low Moon orbit will experience several possible occultations per day, which must be located in time by the MPG View Period Generator. There is also the possibility that some events may correspond to roots of multiplicity greater than unity, when bracket- ing methods may not work. Fortunately, the multiplicity of a root is not of im- portance to the MPG, only its location.

The MPG is charged with finding all root occurrences within a given interval (e.g., a “pass”), with no clues from operators as to where they may be located, and over an entire spectrum of missions undertaken by the DSN. The event met- ric function, however, is known and can be computed for any trial location within the interval. In many cases, however, computational resources are directly pro- portional to the number of function samples evaluated, so the number of samples is an important design consideration.

This subsection therefore discusses methods for searching a given interval X^ ) _ Z of arbitrary length for an unknown number of roots (perhaps zero) of a given function c% u & . Three assumptions are made that derive from MPG prediction characteristics and accuracy requirements. These are that (1) the function whose roots are sought is piecewise continuous across the interval, (2) the function de- Mathematical Techniques 95 rivative may be calculated at continuous points of the function, either directly, or by use of W-variables, (3) the interval endpoints are not roots, (4) two distinct roots are separated by at least an interval of bupbm , and (5) any non-bracket inter- val of length less than bun and having no indication of containing an internal ex- tremum of opposite sign than the endpoints may be presumed to contain no root.

Implications of these assumptions are discussed further and made more precise a little later on. Assumption (3) is not really necessary, for if an endpoint were a root, then the endpoint would be declared a root and the search would continue for remaining roots on the interval shortened by bupbm .

8.4.5.1. Handling Bracketed Intervals If a given search interval X^ ) _ Z brackets an odd number of roots, then one of them, say at u; o , may be found using one of the bracketing root finders dis- cussed earlier. The search would then continue on each of the two subintervals X^ ) o+ b u pbm Z and Xo) b u pbm Z , provided that their lengths exceed bupbm . The guard that is erected around the extracted root of size and the method used to process the newly created subintervals should ensure that the just-found root is not found again in later actions. This guard should be made large enough that the function is distinguishable from zero at the excluded interval endpoints and small enough that no root is deemed “likely” to be within the excluded interval.

8.4.5.2. Non-bracket Intervals The remaining case for study then, is that for which the bounding interval con- tains an even number of roots (including zero). The function values at the inter- val endpoints in this case are nonzero and have the same algebraic sign. If roots exist, then there must be at least one internal extremum point whose function value is either zero or of opposite sign than the endpoints; that is, a zero or bracketing point exists.

Whenever a bracketing point is located, then there must assuredly be at least two roots, either both at the peak or two on either side, between it and each of the endpoints. These roots can be located and two new subintervals created and scheduled for further examination, provided they exceed the minimum allowed size and are not disqualified by criteria implementing assumption (5) above. 96 Explanatory Supplement to Metric Prediction Generation

The root search procedure is thus (1) subdividing with the goal of locating brack- eting points, and (2) adjudging when a given subinterval does not contain a root.

8.4.5.3. Brute Force Search Strategy One sure way of finding all roots follows from (2) above is a brute-force exhaus- tive search, evaluating %_+ ^ &, b u pbm function values in search of brackets or function values that are close enough to zero to qualify as roots. That strategy is a bit austere, as the cost might be extremely high. Nevertheless, it can be applied if absolutely necessary.

Another, somewhat less stringent, approach is to divide the interval in current consideration into somewhat larger subintervals and to examine each separately, as above. Those that are found to bracket a root may be subdivided at the root, as previously indicated.

8.4.5.4. Extremum Search Strategy Literally speaking, if an interval has no internal extremum, then there is a mono- tone progression from the value at one endpoint to the other, and no root can ex- ist in between. But to assure this condition truly prevails may require an imprac- tical, exhaustive search.

The alternative is to develop criteria that indicate, “with high reliability,” that no roots are contained within the interval. Such criteria, if truly successful, will likely require ad hoc knowledge about the type of function being rooted, and, for this reason, bode against developing a general-purpose root finder. Finding the roots of a particular MPG view period event metric function, for example, usu- ally relies on known underlying physical characteristics of the particular event itself.

One subdivision criterion that could be applied involves a search for extrema within the interval. There are a number of extremum-locating algorithms that do and do not involve derivatives that could be applied, if needed. If the interval were adjudged to have none, then it could be eliminated from the list of intervals under consideration.

If the extremum value were to be of the same sign as the endpoints, then there is no indication of roots being within the interval, although an even number of them might still exist in the interval if the interval length were greater than bun and if Mathematical Techniques 97 there were other extrema within the interval. If so, the interval could again be subdivided at the peak point, just to be on the safe side.

This process could continue until all subintervals either contain no extrema or else are smaller than the allowed size. When done, all roots within the original interval will have been found, with high likelihood (depending on the choices of bupbm and bun ). This strategy winds up breaking the original interval into subin- tervals no larger than than bu; j^u% b upbm ) b u n & , with a lot of thrashing around finding extrema and roots, when they occur.

A better strategy, then, would have been just to subdivide the original interval into subintervals of length bu at the outset, and then to examine these for brack- ets and non-brackets. For the non-bracket subintervals, criteria implementing assumption (5) would be required to judge the likelihood of internal extrema.

8.4.5.5. Subdivision Termination Strategy There are useful indicators when an extremum is and is not likely to exist in an interval of critical size bu . If function derivative values are available at the end- points, then the cubic Hermite interpolation polynomial fitting the function and derivative values may be created. The coefficients of this polynomial normalized to the interval %-)e & are

^-; c -

^.; c - ~

0%cc.-+ & + %/ c -.~ ) ce ~ & (8-265) ^ ; / e/ +/%c + c & ) % c~ ) ce ~ & ^ ; . - - . 0 e0 The real zeros of this polynomial (of which there are one or three) may be deter- mined using algebraic means, as well as the real zeros of its derivative (of which there are either none or two). Those that lie within the interval %-)e & are poten- tial points for further investigation.

If derivatives are not available, then function values of the two surrounding in- tervals may be used to create the cubic Lagrange interpolation polynomial. The 98 Explanatory Supplement to Metric Prediction Generation coefficients in this case are not as simple to write out, but can be found using the methods given in the interpolation chapter.

In either case, the roots of the polynomials that lie within the interval of interest may be used as starting points for root polishing. Any roots of the function found in this way may be added to the list of roots and used to subdivide the interval into subintervals. Any roots of the derivative function (when available) corre- spond to extrema, which can be checked for root bracketing purposes.

When the derivative function is not available, the polynomial derivative roots can be used as the starting estimate of an extremum location program, such as DFMIN in the MathC90 library [Lawson1994]. Since the actual extremum loca- tion is not needed with high accuracy, a lower tolerance can be used in the proc- ess to lower the number of function accesses used.

If neither the interpolation polynomial nor its derivative has zeros within the in- terval boundaries, or if their root polishing was unsuccessful, and if the interval size is less than or equal to bun , then no further consideration of the interval is necessary.

The choices of bun and bupbm are then the only application-dependent quantities remaining. These values will generally have to be assigned on a case-by-case basis. Mathematical Techniques 99

References

[Abramowitz1964] Abramowitz, M., and Stegun, I., Handbook of Mathematical Functions with Formulas Graphs and Mathematical Tables , National Bureau of Standards Applied Math Series 55, U. S. Dept. of Commerce, U. S. Govt. Print- ing Office, Washington, DC, 20402, 1964, Tenth Printing 1972.

[Acton2005-Rotate] Acton, C. H., et al., “Rotations”, SPICE Required Reading, Navigation Ancillary Information Facility, Jet Propulsion Laboratory, Pasadena, CA, 2005 (rev).

[Acton2005-Useful] Acton, C. H., et al., “Most Useful SPICELIB Subroutines”, SPICE Required Reading, Navigation Ancillary Information Facility, Jet Propul- sion Laboratory, Pasadena, CA, 2005 (rev).

[Griewank1991] Griewank, Andreas and Corliss, George F., Eds., “Automatic Differentiation of Algorithms,” Proceedings of SIAM Workshop on Automatic Differentiation of Algorithms , January 1991, Society for Industrial and Applied Mathematics, Philadelphia (1991), 353 pp. This work contains 31 papers report- ing the 1991 state of the art in algorithms and software for automatic differentia- tion. Includes the paper by Lawson specifically relating to the method for series reversion.

[Irving1959] Irving, J., and Mullineux, N., Mathematics in Physics and Engi- neering , Chapter XI, Academic Press, New York, 1959.

[Lass1950] Lass, Harry, Vector and Tensor Analysis , McGraw-Hill Book Co., Inc., NY, 1950.

[Lawson1971] Lawson, C. L., “Computing Derivatives using W-arithmetic and U-arithmetic,” JPL Internal Computing Memorandum 289, Jet Propulsion Labo- ratory, Pasadena, CA, (Sept. 1971).

[Lawson1994] Lawson, C. L., Krogh, F. T., and Snyder, W. V. “MATH77 and mathc90, Release 5.0,” JPL publication JPL D-1341, Rev. D, Jet Propulsion Laboratory, Pasadena, CA, March 1994. 100 Explanatory Supplement to Metric Prediction Generation

[Press1986] Press, W. H., Flannery, B. P., Teukolsky, S. A., and Vetterling, W. T., Numerical Recipes , Cambridge University Press, 1986.

[Tausworthe2006] Tausworthe, Robert C., “W-Variable Functions,” MPG Mathematica Series, Jet Propulsion Laboratory, Pasadena, CA, 2006.

[Wengert1964] Wengert, R. E., “A Simple Automatic Derivative Evaluation Program,” Comm. ACM 7 (Aug. 1964), pp. 463--464.