Semialgebraic Geometry School of Mathematics and Statistics, University of Sydney A thesis submitted in fulfillment of the requirements for the degree of Master of Philosophy
Mark S. Perrin 2020
1 Abstract In this thesis we introduce the preliminary results required to appreciate some key results of classical semialgebraic geometry. Namely, we give a detailed account of Sturm’s root counting method, and Hermite’s root counting method - both are used to count the number of real solutions of a finite system of polynomial equations in a single variable. After building a foothold in both the theoretical and algorithmic setting of the root counting methods, we extend to systems of polynomial equations and inequations, wherein lies the bridge to the definition of semialgebraic sets as a natural extension to algebraic sets. The main results covered are the cylindrical algebraic decomposition and triangulation of semial- gebraic sets, as well as Hardt’s semialgebraic triviality of semialgebraic sets, where we show numerous consequences of each of these. We give the proofs of important concepts with a focus on intuitive exemplification and illustration. In the final section we discuss some improvements to the implementation of the standard conditions of Thom’s lemma, 3.4.1, which may have implications on the efficiency of the projection operation used in cylindrical algebraic decomposition.
2 Acknowledgments
I want to express my gratitude toward Laurentiu for his persistence and patience as my supervisor throughout my degree and writing of this thesis. I appreciate his diligence as a mentor - leading me with suggestions to seemingly independent personal discoveries and first-hand revelations in new topics in mathematics, and teaching me to always ask questions - which has undoubtedly improved my ability to learn. I also want to thank my family for the support relating to my studies or otherwise. They have always encouraged my pursuits in every way they can, with full faith that I will try my best to succeed, or at least make the most of the cards that might be dealt. Finally, I want to thank Gwen for being so supportive during our time to- gether including my studies, for being understanding in personal times, and for giving me motivation when it is most needed.
3 Contents
1 Counting real roots of polynomials 7 1.1 Sturm’s method ...... 7 1.2 Tarski-Seidenberg - Elimination of a variable ...... 30 1.3 Hermite’s method ...... 37
2 Semialgebraic sets 39 2.1 Semialgebraic sets - Definitions and examples ...... 39 2.2 Tarski-Seidenberg - Second form ...... 41 2.3 Tarski-Seidenberg - Third form ...... 42 2.4 Semialgebraic functions ...... 43
3 Decomposing semialgebraic sets 54 3.1 Cylindrical algebraic decomposition ...... 54 3.2 Constructing a c.a.d. adapted to a finite family of polynomials . 59 3.3 Algorithmic construction of an adapted c.a.d...... 78 3.4 Improved cylindrical algebraic decomposition ...... 79 3.5 Dimension of semialgebraic sets ...... 101 3.6 Triangulation of semialgebraic sets ...... 110
4 Hardt’s trivialization and consequences 123 4.1 Semialgebraic triviality of semialgebraic sets ...... 123 4.2 Semialgebraic Sard’s theorem and upper bounds on connected components ...... 136 4.3 Algebraic computation trees and lower bounds on connected com- ponents ...... 148
5 Implementing the (?) condition 154 5.1 Checking the (?) condition ...... 154 5.2 Completing a family to satisfy the (?) condition ...... 161
A Hermite’s Method 166
4 Introduction
The goal of this thesis is to give a detailed discussion of several key results in semialgebraic geometry, emerging in the 1970’s and onward. Of course, much of the underlying theory is older, relying on knowledge of algebraic geometry and analysis. We aim to give the new reader an introduction to the topic, providing rigorous and detailed proofs, complemented by an incremental building of the relevant theory, and illustrated with many worked examples. Following each of the main results (Theorems 3.4.7, 3.4.9, 3.6.1, and 4.1.1), we outline various consequences concerning the dimension, and the number of connected compo- nents of semialgebraic sets. Many of the results (and their proofs) we cover are taken from a preprint of Michel Coste’s ‘Introduction to Semialgebraic Geom- etry’ (2002), [1]. The structure of the paper reflects the logical dependence of the results, and hence the order in which the material should be learned. In Section 1 the root counting methods of Sturm and Hermite are introduced, providing a way to count the number of real solutions of a system of finitely many polynomial equations in a single variable. Sturm’s method is developed further to count the number of solutions of a system consisting of a polynomial equation and several polynomial inequalities, and Hermite’s method is devel- oped to relate the principal subresultant coefficients of two polynomials with the multiplicities of their common factors. This naturally gives a precursor for the definition of semialgebraic sets, as well as providing some powerful tools for their detailed examination. However, the reader may start with semialgebraic sets in Section 2, and return to the referenced material from Section 1 when necessary. The basic properties of semialgebraic sets are introduced in Section 2, particularly stability properties, as well as the tools commonly used to study semialgebraic sets. Semialgebraic sets are, by definition, sets of points in Rn satisfying finite boolean combinations of sign conditions on a family of polyno- mials, and being able to count the solutions of such a system is crucial in the method of cylindrical algebraic decomposition of semialgebraic sets into simpler, well-arranged subsets we call strata, or slightly more generally, cells. The reader will become familiar with taking projections of semialgebraic sets, which is used extensively throughout Section 3 as one of the primary methods used to study semialgebraic sets, and is responsible for some of the main features, including the cylindrical arrangement of cells, in a constructed cylindrical algebraic de- composition. Decomposing semialgebraic sets in this way provides information on the topology and dimension of the sets, and allows for the proof that every semialgebraic set can be triangulated, (Theorem 3.6.1). In Section 4 we intro- duce the notion of semialgebraic triviality in order to present Hardt’s theorem as the next main result. We use the triangulation theorem to prove Hardt’s the- orem, 4.1.1, on the triviality of semialgebraic sets, which states that the image of a semialgebraic set S by a continuous semialgebraic mapping h can be repre- sented as a finite union of subsets, over which the mapping is semialgebraically trivial, and that this triviality can be refined to be compatible with a finite collection of semialgebraic subsets of S. Hardt’s theorem has consequences on the structure, dimension, and number of topological types of semialgebraic sets,
5 and is used to prove the semialgebraic version of Sard’s theorem. Throughout, we aim to give illustrations and geometric exemplification of the key concepts in understanding these results. Some of the propositions/corollaries stated and proven throughout this thesis are posed as exercises in [1] (Coste).
6 1 Counting real roots of polynomials
In this section we explore two methods for counting the real roots of polyno- mial equations with 1 variable. Namely, we consider Sturm’s method, and the method of Hermite. Sturm’s method invokes a variation of the Euclidean divi- sion algorithm to produce a sequence of polynomials, from which we are able to infer the number of distinct real roots of a polynomial on an open (not nec- essarily bounded) interval of the real line. We show how to extend this method to count the number of distinct real roots of a system of polynomial equations and inequalities in a single variable, and how to do so algorithmically, which will be used in Section 3 as a foundation for decomposing semialgebraic sets in an effective way. Hermite’s method uses Newton sums, and principal minors of a matrix related to the Newton sums, to count the distinct complex roots of a polynomial, and even the distinct real roots of a polynomial. We show how to modify the setup in order to compute the number of solution of a system consist- ing of a polynomial equation and a polynomial inequality in a single variable. This method invokes Galois theory, principal subresultant coefficients, and a theorem of Jacobi on symmetric bilinear forms. We also show how the princi- pal subresultant coefficients of pairs of polynomials relate to common roots of these polynomials, which is an important feature we use in a construction in Section 3. While they are related, (see [2]), the methods of Sturm and Hermite are quite different in their application. Sturm’s method requires branching of computation trees based on the signs of certain polynomials in the coefficients we start with, while Hermite’s method avoids branching altogether.
1.1 Sturm’s method
Consider two nonzero polynomials P,Q ∈ R[X]. We construct a sequence of polynomials P0,P1,...,Pk by taking P0 := P , P1 := Q, and for i > 0 we define Pi+1 as the negative of the remainder of the Euclidean division of Pi−1 by Pi. That is, Pi−1 = PiAi + Ri for some Ai,Ri ∈ R[X], and we define Pi+1 := −Ri. From this definition we can write Pi−1 = PiAi − Pi+1, or Pi+1 = PiAi − Pi−1 equivalently. We now define the Sturm sequence of P and Q as (P0,P1,...,Pk), where Pk is the last nonzero polynomial in the sequence. Note that Pk = ± gcd(P,Q), since the process for computing Pk is the Euclidean algorithm up to a sign change. Example 1. Let P = X3 + 2X2 + 4, Q = X2 + 4X − 2. Then we have
P = Q · (X − 2) + (10X) =⇒ P2 = −10X, 1 Q = P · (− X) + (4X − 2) =⇒ P = −4X + 2, 2 10 3 10 P = P · ( ) − 5 =⇒ P = 5. 2 3 4 4
7 Then the Sturm sequence of P and Q is
3 2 2 (P0,...,P4) = (X + 2X + 4,X + 4X − 2, −10X, −4X + 2, 5).
8 P P0 P1 6 P2 P3 4
2
X −5 −4 −3 −2 −1 1 2 3 4 5 −2
−4
−6
Figure 1: Graph of P0,...,P3
The above choice of P and Q was such that they had no common roots, (and neither had multiple roots). As such, the last term in the sequence, P4 = 5, is constant. We consider an example where this is not the case. Example 2. Let P = (X − 1)2(X − 4) = X3 − 6X2 + 9X − 4, and Q = (X − 1)(X − 2) = X2 − 3X + 2. Then
P = Q · (X − 3) − (2X − 2) =⇒ P2 = 2X − 2, 1 Q = P · ( X − 1) =⇒ P = 0. 2 2 3 Thus the Sturm sequence of P and Q is
3 2 2 (P0,P1,P2) = (X − 6X + 9X − 4,X − 3X + 2, 2X − 2).
One notices that P has a double root at X = 1, and a simple root at X = 4, while Q has only simple roots at X = 1 and X = 2. In particular, we notice that P and Q have a common root at X = 1. As a result of this, the last term of the sequence, P2 = 2(X − 1), is non-constant, and is indeed equal to gcd(P,Q) (up to some nonzero scalar multiple).
8 We now look at the sign changes in the sequence (P0(X),...,Pk(X)) as X varies over R. Denote by νP,Q(x) the number of sign changes in the sequence (P0(x),...,Pk(x)), the Sturm sequence of P and Q evaluated at x. For instance, with P and Q as in Example 1 we have (P0(1),...,P4(1)) = (7, 3, −10, −2, 5), so we find that νP,Q(1) = 2. In the case that some Pi(x) is zero, we simply ignore that entry, so we have (P0(0),...,P4(0)) = (4, −2, 0, 2, 5), with νP,Q(0) = 2. We want to examine how νP,Q changes as X varies. Observe that the value of νP,Q(X) can only possibly change if some of the Pi in the Sturm sequence change sign. That is, νP,Q(X) can only possibly change as X moves over roots of the polynomials Pi in the Sturm sequence. Hence νP,Q is constant on the open intervals between the roots of all Pi in the sequence. This means that we only need to check the value of νP,Q(X) at one point in each of these open intervals to understand νP,Q, at least schematically. Thus we are prompted to investigate what happens as X moves over such roots. Remark: A particular case of the function νP,Q is used in [1], where Q is 0 taken to be P , and νP is used to denote what we would call νP,P 0 , omitting the second argument when it is the derivative of P . As we approach Sturm sequences in greater generality, we use the notations νP,P 0 and νP,Q to avoid ambiguity at all stages. We aim to understand the function νP,Q itself to gain a deeper understanding of what it can tell us, and why results such as Sturm’s Theorem 1.1.2 occur. We explore the behavior of νP,Q in the following example before we begin examining its properties rigorously. 4 2 2 Example 3. Let P = X −6X , Q = X +X−2. Then P2 = −X+6, and P3 = 4 2 2 −40, and the Sturm sequence√ √ of P and Q is (X −6X ,X +X−2, −X+6, −40). P has roots at X = − 6, 0, 6, Q has roots at X = −2, 1, and P2 has a root at X = 6.
9 8 P P0 6 P1 P2 4
2 X −5 −4 −3 −2 −1 1 2 3 4 5 −2
−4
−6
−8
Figure 2: Graph of P0, P1, P2.
We evaluate νP,Q on the open intervals between distinct roots of each Pi:
νP,Q(−3) = number of sign changes in (135, 4, 9, −40) = 1, √ νP,Q(− 5) = number of sign changes in (−5, 0.76, 8.2, −40) = 2,
νP,P 0 (−1) = number of sign changes in (−5, −3, 7, −40) = 2, 1 ν ( ) = number of sign changes in (−1.4, −1.25, 5.5, −40) = 2, P,Q 2
νP,Q(2) = number of sign changes in (−8, 4, 4, −40) = 2,
νP,Q(3) = number of sign changes in (27, 10, 3, −40) = 1,
νP,Q(7) = number of sign changes in (2107, 54, −1, −40) = 1. From now on we will omit the intermediate calculations when computing the value of νP,Q. Observe that νP,Q(X) indeed only changes as X passes over roots of P . However, νP,Q(X) does not change as√X passes√ over all roots of P . Notice that νP,Q(X) changes as X moves over − 6 and 6 (both of which are simple roots of P ), but does not change as X moves over 0, which is a double root of P . To see why, we highlight the fact that P is negative on both intervals (−, 0) and (0, +) for suitably small > 0. Of course, all other terms of the sequence also have constant sign about X = 0. Thus there is no difference in the number of sign changes in the Sturm sequence evaluated at − compared to +. That is, νP,Q(−) = νP,Q(+) for suitably small > 0. We investigate this further.
10 Consider polynomials P,Q ∈ R[X], and the Sturm sequence P0,...,Pk) of P and Q. For convenience, we will first assume that P and Q have no common roots. We know that Pk will therefore be a nonzero constant. By the construction of a Sturm sequence, we have
Pi+1(c) = Pi(c)Ai(c) − Pi−1(c).
If Pi(c) = 0 for some 1 ≤ i < k, then
Pi+1(c) = −Pi−1(c).
Hence if Pi+1(c) = 0 as well, then Pi+1(c) = −Pi−1(c) = 0, and it is clear that all P0(c), ..., Pk(c) are in fact forced to be zero. However, since P and Q are assumed to be relatively prime we know that Pk is a nonzero constant. This means that we cannot have consecutive terms of the Sturm sequence si- multaneously equal to zero. Therefore, if Pi(c) = 0 for some 1 ≤ i < k, we have Pi+1(c) = −Pi−1(c) 6= 0. Hence, whether Pi(X) changes sign or not as X passes over the root c, the sub-sequence (...,Pi−1,Pi,Pi+1,...) has the same number of sign changes on the interval (c − , c) as it does on the interval (c, c + ), for suitably small > 0. That is, the roots of P1,...,Pk do not contribute any change to the number of sign changes in the Sturm sequence. Therefore, we need only consider the roots of P0 = P . If P (c) = 0 for some c ∈ R, then either Q > 0 on (c − , c + ), or Q < 0 on (c − , c + ). Assume that Q > 0 on (c − , c + ). Then either
1. P (c − ) < 0 and P (c + ) > 0, giving νP,Q(c − ) − νP,Q(c + ) = 1, or
2. P (c − ) > 0, and P (c + ) < 0, giving νP,Q(c − ) − νP,Q(c + ) = −1, or
3. P (c − )P (c + ) > 0, in which case νP,Q(c − ) − νP,Q(c + ) = 0. We have a similar result when Q < 0 on (c − , c + ). In summary, if the sign of P changes from −sign(Q) to sign(Q) then the number of sign changes in the Sturm sequence goes down by one, hence νP,Q(c−)−νP,Q(c+) = 1, and if the sign of P changes from sign(Q) to −sign(Q) then the number of sign changes in the Sturm sequence goes up by one, hence νP,Q(c − ) − νP,Q(c + ) = −1. Therefore, for a, b ∈ R not roots of P , with a < b, the value of νP,Q(a)−νP,Q(b) is equal to the number of distinct roots c of P in (a, b) such that Q(c) > 0 minus the number of those such that Q(c) < 0. We consider two examples now to illustrate this result. 2 Example 4. Let P = (X − 1)(X + 1) = X − 1, Q = X − 2. Then P2 = −3, and the Sturm sequence of P and Q is (X2 − 1,X − 2, −3).
11 8 P P0 6 P1 P2 4
2 X −5 −4 −3 −2 −1 1 2 3 4 5 −2
−4
−6
−8
Figure 3: Graph of P0, P1, P2.
We evaluate νP,Q on each open interval between distinct roots of each Pi:
νP,Q(−2) = 1,
νP,Q(0) = 0,
νP,Q(3/2) = 1,
νP,Q(3) = 1.
Observe that νP,Q(X) changes as X passes roots of P . More specifically, νP,Q decreases by 1 as X passes over −1, since Q < 0 on (−1 − , −1 + ), P > 0 on (−1 − , −1), and P < 0 on (−1, −1 + ). Similarly, νP,Q increases by 1 as X passes over +1. Of course, we also see that νP,Q does not change as X passes roots of Q.
2 Example 5. Let P = (X − 1)(X + 1) = X − 1, Q = X + 2. Then P2 = −3, 2 and the Sturm sequence of P and Q is (X − 1,X + 2, −3). Evaluating νP,Q between each of the distinct roots of the terms Pi we find:
νP,Q(−3) = 1, 3 ν (− ) = 1, P,Q 2
νP,Q(0) = 2,
νP,Q(2) = 1.
12 6 P P0 P1 P 4 2
2
X −5 −4 −3 −2 −1 1 2 3 4 5
−2
−4
Figure 4: P0, P1, P2.
As in the previous example, we observe that νP,Q(X) changes as X passes roots of P , decreasing by 1 when the sign of P changes from the opposite sign to Q to the same sign as Q, and increasing by 1 when the sign of P changes from the same sign as Q to the opposite sign.
We now consider arbitrary polynomials P,Q ∈ R[X], so that P and Q may no longer be relatively prime. Constructing the Sturm sequence of P and Q in the same way as before with P0 = P and P1 = Q, the last nonzero term Pk = ± gcd(P,Q) is not necessarily a constant, and therefore may have real roots. Construct the modified Sturm sequence P Q P ( , ,..., k−1 , 1), Pk Pk Pk diving each term of the usual Sturm sequence of P and Q by Pk. Writing P = T0Pk and Q = T1Pk, we have
P2 = (T1Pk)A1 − T0Pk = Pk(T1A1 − T0).
Applying induction we have
Pi+1 = PiAi − Pi−1 = (TiPk)Ai − Ti−1Pk = Pk(TiAi − Ti−1), showing that each term in the modified Sturm sequence ( P , Q ,..., Pk−1 , 1) Pk Pk Pk is indeed polynomial, since each term P0,...,Pk−1 is divisible by Pk. The above induction also shows that the Sturm sequence generated by P and Q is Pk Pk
13 precisely the same as the modified Sturm sequence ( P , Q ,..., Pk−1 , 1). Since Pk Pk Pk P and Q are relatively prime we know that they cannot simultaneously be Pk Pk zero. Hence there can never be two consecutive terms in this modified Sturm sequence which vanish simultaneously, and so we retrieve the desired property that if some Pi (c) = 0 for 1 ≤ i < k, then Pi−1 (c) = − Pi+1 (c) 6= 0. Hence Pk Pk Pk the sub-sequence (..., Pi−1 , Pi , Pi+1 ,...) contributes no change to the number Pk Pk Pk of sign changes in the modified Sturm sequence as X moves over roots Pi for Pk 1 ≤ i < k. We consider the real roots of P with multiplicities. We write P and Q m1 mr in terms of the roots of P so that P (X) = (X − a1) ... (X − ar) and n1 nr Q(X) = (X − a1) ... (X − ar) A(X), where A(X) and P (X) are relatively prime, and noting that the multiplicities ni of roots ai in Q may be zero. Then pi pr we have Pk(X) = (X − ai) ... (X − ar) , where pi := min(mi, ni). For a P root ai of P , if mi ≤ ni then ai is not a root of , since pi ≥ mi. Therefore Pk the number of sign changes in the modified Sturm sequence does not change as X passes over such roots of P . Hence we only need to consider roots of P for which mi > ni. P Let ai be a real root of P , and assume that mi > ni. Then ai is a root of Pk Q of multiplicity mi − pi = mi − ni, and ai is not a root of . One notices that Pk mi + ni and mi − ni have the same parity. If mi + ni is even, then ai is a root of P P of even multiplicity. This means that has the same sign on (ai − , ai) as Pk Pk it does on (ai, ai +), and therefore contributes no change to the number of sign P Q changes in the Sturm sequence of and . If mi + ni is odd, then ai is a root Pk Pk P P P of of odd multiplicity. Hence changes sign as X passes over ai. If has Pk Pk Pk Q Q the opposite sign to on (ai − , ai) and the same sign as on (ai, ai + ), Pk Pk then the number of sign changes in the modified Sturm sequence decreases by 1 as X passes over ai. That is,
ν P , Q (ai − ) − ν P , Q (ai + ) = 1. Pk Pk Pk Pk
Conversely, if P changes from the same sign as Q to the opposite sign, then Pk Pk the number of sign changes in the modified Sturm sequence increases by 1 as X passes over ai, and so
ν P , Q (ai − ) − ν P , Q (ai + ) = −1. Pk Pk Pk Pk
We now wish to show that the number of sign changes in (P, Q, . . . , Pk) is precisely the same as the number of sign changes in the modified Sturm sequence ( P , Q ,..., Pk−1 , 1) when evaluated at points which are not roots of P . We note Pk Pk Pk P Q that the relative sign changes of and about the root ai are the same as Pk Pk the relative sign changes of P and Q about ai, as in each of the situations above. Furthermore, observe that whether Pk(x) is positive or negative, dividing each term in the Sturm sequence by Pk(x) clearly does not alter the number of sign changes in the sequence, and so does not affect the value of ν, keeping in mind
14 that Pk(x) 6= 0 when P (x) 6= 0. That is, for points x which are not roots of P ,
νP,Q(x) ≡ ν P , Q (x). Pk Pk We now have the following theorem.
Theorem 1.1.1. Let P,Q ∈ R[X], and let a, b ∈ R not roots of P , with a < b. Denote by νP,Q(x) the number of sign changes in the Sturm sequence of P and Q evaluated at x. Then the value of νP,Q(a) − νP,Q(b) is equal to the number of distinct roots c of P in the interval (a, b) such that the multiplicity of c as a root of P is strictly greater than the multiplicity of c as a root of Q with P (c − )Q(c − ) < 0 and P (c + )Q(c + ) > 0 minus the number of those such that P (c − )Q(c − ) > 0 and P (c + )Q(c + ) < 0.
To summarize the above theorem intuitively, νP,Q(a) − νP,Q(b) counts the number of distinct real roots of P in an interval such that the sign of P changes to the sign of Q over the root, minus the number of those such that P changes away from the sign of Q over the root, but does not count any of the roots in the interval such that their multiplicity as a root of Q is equal to or greater than their multiplicity in P . We include an example to illustrate the above theorem. 1 2 5 4 3 Example 6. Let P = 10 (X + 3)(X )(X − 2)(X − 4) = X − 3X − 10X + 24X2, Q = X − 2. Since Q divides P , the Sturm sequence of P and Q is 1 5 4 3 2 ( 10 (X − 3X − 10X + 24X ),X − 2).
10 P P 8 P1
6
4
2 X −4 −3 −2 −1 1 2 3 4 5 −2
−4
−6
−8
−10
Figure 5: Graph of P and P1.
15 Computing the value of νP,Q between each of the distinct roots of P and Q we find: νP,Q(−4) = 0,
νP,Q(−1) = 1,
νP,Q(1) = 1,
νP,Q(3) = 1,
νP,Q(5) = 0. The nature of each of the distinct roots of P (in terms of relative sign changes to Q) are all different. Particularly, as X moves over the root c1 = −3 of P , the sign of P changes from the same sign as Q to the opposite sign. We also see that νP,Q(−4) − νP,Q(−1) = −1 in correspondence with these relative sign changes, (making sure that c1 = −3 is the only root of P in the interval (-4, -1), of course). As X moves over both c2 = 0 and c3 = 2, both roots of P , there are no relative sign changes between P and Q,(c2 is a double root of P , which is not a root of Q, while c3 is a simple root of both P and Q). Correspondingly, νP,Q(1) − νP,Q(1) = 0 and νP,Q(1) − νP,Q(3) = 0. Finally, as X moves over the root c4 = 4 of P , the sign of P changes from the opposite sign, to the same sign as Q, and indeed we observe that νP,Q(3) − νP,Q(5) = 1. We apply Theorem 1.1.1 to the special case of Q = P 0. (As stated at the introduction of the function νP,Q, the special case νP,P 0 is used in [1], but is simply denoted by νP .) We know that a root ai of P with multiplicity mi is also 0 a root of P with multiplicity mi −1, (where a multiplicity of 0 means it is not a root). The sum of these multiplicities mi +(mi −1) is clearly always odd. Hence, according to Theorem 1.1.1, the function νP,P 0 is monotone-decreasing. More precisely, νP,P 0 is constant on the open intervals between the roots of P , with a discontinuity at each root of P , where the value decreases by 1. Therefore, the value of νP,P 0 (−∞) − νP,P 0 (x) is monotone-increasing with x. If c is a root of P , then νP,P 0 (c) = νP,P 0 (c − ) − 1 for > 0 sufficiently small, meaning that the value of νP,P 0 (x) is constant on an interval of the form (c − , c), and on an interval of the form [c, c + ).
νP,P 0
c
Figure 6: The function νP,P 0 is monotone decreasing, with a discontinuity at roots of P .
16 We need only consider the open intervals since we can simply evaluate νP,P 0 at points that are not roots of P . Hence we obtain the following theorem as a corollary to Theorem 1.1.1.
Theorem 1.1.2 (Sturm’s Theorem). Let P ∈ R[X], and let a, b ∈ R not roots of P , with a < b. Denote by νP,P 0 (x) the number of sign changes in the Sturm 0 sequence of P and P evaluated at x. Then the value of νP,P 0 (a) − νP,P 0 (b) is equal to the number of distinct roots of P in the interval (a, b). Proof. We first note that all roots c of P are roots of P 0 with lower multiplicity. Thus in the context of Theorem 1.1.1, all real roots in the interval (a, b) will be counted (νP,Q will either increase or decrease as x passes over them). We also note that the sign of P is the opposite of the sign of P 0 on some interval (c−, c), while they have the same sign on some interval (c, c + ), for all real roots c of P . To see this, observe the behavior of P and P 0 about roots of P of even multiplicities and odd multiplicities separately. Particularly, if P is decreasing (increasing) on (c − , c), then P must be positive (negative) on (c − , c), and P 0 must be negative (positive) on this interval. Furthermore, if P is decreasing (increasing) on (c, c+), then P must be negative (positive) on (c, c+), and P 0 must be negative (positive) on this interval. Applying Theorem 1.1.1, we have that νP,P 0 (a) − νP,P 0 (b) is precisely equal to the number of distinct roots of P in the interval (a, b).
Example 7. Let P = X2(X − 2) = X3 − 2X2, so that P 0 = 3X2 − 4X. Then 8 P2 = 9 X. Computing the value of νP,P 0 between each of the distinct roots of P we find: νP,P 0 (−1) = 2,
νP,P 0 (1) = 1,
νP,P 0 (3) = 0.
17 10 P P 8 P 0 P2 6
4
2 X −4 −3 −2 −1 1 2 3 4 5 −2
−4
−6
−8
−10
0 Figure 7: Graph of P , P , P2.
Observe that νP,P 0 decreases by 1 over each distinct root of P , regardless of the multiplicity of the root.
According to the above theorem, νP,P 0 counts the real roots of a polynomial P ∈ R[X] in an interval. This is owing to the fact that the relative sign changes of a polynomial P and P 0 about the real roots of P are consistent, regardless of the polynomial in question. A subtle fact contributing to this is that for a root c of a polynomial P of multiplicity m, we can consider c as a root of P 0 of multiplicity n = m − 1 (possibly zero), meaning that m + n = 2m − 1, which is always odd. We are able to make use of these basic features in order to count the real roots of a polynomial P , either positively or negatively, according to the sign of a second polynomial, say, Q ∈ R[X], at these roots of P , thus giving us an opening into solving polynomial systems of the form “P = 0 and Q > 0”.
Theorem 1.1.3. Let P,Q ∈ R[X], and let a, b ∈ R not roots of P , with a < b. Denote by νP,P 0Q(x) the number of sign changes in the Sturm sequence of P 0 and P Q evaluated at x. Then the value of νP,P 0Q(a)−νP,P 0Q(b) is equal to the number of distinct roots c of P in the interval (a, b) such that Q(c) > 0 minus the number of distinct roots c of P in the interval (a, b) such that Q(c) < 0. Proof. We already know that for a general Sturm sequence, ν is unaffected by roots of all trailing terms P1,...,Pk of the sequence, and is only affected by the leading term, P0. Consider a real root c of P with multiplicity, say, m. We also consider c as a root of both P 0 and Q of multiplicities m − 1 and n respectively, (both possibly zero), so that c is a root of P 0Q of multiplicity m−1+n. Clearly,
18 if n ≥ 1 then m − 1 + n ≥ m. That is, if c is indeed a root of Q, then c is a root of P 0Q of multiplicity at least as great as the multiplicity of c as a root of P . Hence, by Theorem 1.1.1, the value of νP,P 0Q is unaffected by the roots of P which are also roots of Q. We now consider roots c of P which are not roots of Q, so that c is a root of P 0Q of multiplicity m − 1 + n = m − 1 < m. Recall that the relative signs of P and P 0 about c are such that PP 0 < 0 on some (c − , c), and PP 0 > 0 on some (c, c + ), and that Q has constant (nonzero) sign about c. Therefore, on some interval (c − , c),
sign(PP 0Q) = sign(PP 0)sign(Q) = −sign(Q), and on some interval (c, c + ),
sign(PP 0Q) = sign(PP 0)sign(Q) = sign(Q).
That is, if Q(c) > 0, then P changes from the opposite sign of P 0Q on (c − , c), to the same sign as P 0Q on (c, c + ). Conversely, if Q(c) < 0, then P changes from the same sign as P 0Q on (c − , c), to the opposite sign of P 0Q on (c, c + ). Applying Theorem 1.1.1, it is clear that νP,P 0Q(a) − νP,P 0Q(b) is equal to the number of distinct roots c of P in the interval (a, b) such that Q(c) > 0 minus the number of those such that Q(c) < 0.
Remark: We bring attention to the fact that νP,P 0Q does not count roots c of P for which Q(c) = 0. This is illustrated in the following example, along with the statement of Theorem 1.1.3. Example 8. Let P = X(X −2)(X +2) = X3 −4X, so that P 0 = 3X2 −4, and 0 3 8 let Q = X. Then we have P1 = P Q = 3X −4X, and so P2 = 3 X. Computing the value of νP,P 0Q between each of the distinct roots of P we find:
νP,P 0Q(−3) = 0,
νP,P 0Q(−1) = 1,
νP,P 0Q(1) = 1,
νP,P 0Q(3) = 0. Note that the graph of Q itself has not been included in the following figure for simplicity.
19 10 P P 8 P 0Q P2 6
4
2 X −4 −3 −2 −1 1 2 3 4 5 −2
−4
−6
−8
−10
0 Figure 8: Graph of P , P Q, P2.
Observe that Q(−2) < 0 and Q(2) > 0. Indeed we find that νP,P 0Q(−3) − νP,P 0Q(−1) = −1 and νP,P 0Q(1) − νP,P 0Q(3) = 1 in correspondence with the sign of Q over the roots at −2 and 2. We also see that Q(0) = 0, and νP,P 0Q(1)− νP,P 0Q(1) = 0. That is, νP,P 0Q did not count the root c = 0 of P which was also a root of Q.
Following the above results, given polynomials P,Q ∈ R[X], we are able to count the number of distinct real roots c of P in an interval such that Q(c) > 0 as follows. For roots c of P such that Q(c) 6= 0, we have Q(c)2 > 0. If we then apply Theorem 1.1.3 with Q2 replacing Q, we have that, for a < b ∈ R not roots of P , νP,P 0Q2 (a) − νP,P 0Q2 (b) is equal to the number of distinct roots c of P in the interval (a, b) such that Q(c)2 > 0, which is equal to the number of distinct roots c of P in the interval (a, b) such that Q(c) > 0 plus the number of those such that Q(c) < 0. We define the following expression involving the Sturm sequence of P and Q, as well as the Sturm sequence of P and Q2:
NP,Q(a, b) := [νP,P 0Q(a) − νP,P 0Q(b)] + [νP,P 0Q2 (a) − νP,P 0Q2 (b)], where (νP,P 0Q(a) − νP,P 0Q(b)) is equal to the number of distinct roots c of P in the interval (a, b) such that Q(c) > 0 minus the number of those such that Q(c) < 0, and (νP,P 0Q2 (a)−νP,P 0Q2 (b)) is equal to the number of distinct roots c of P in the interval (a, b) such that Q(c) > 0 plus the number of those such that Q(c) < 0. That is, NP,Q(a, b) is equal to twice the number of distinct roots c of P in the interval (a, b) such that Q(c) > 0. We give a brief example to
20 illustrate this. Example 9. Let P = X(X − 1)(X + 1) = X3 − X and let Q = X. By observation, the number of real roots c of P such that Q(c) > 0 is 1.
P P 4 Q
2
X −3 −2 −1 1 2 3
−2
−4
Figure 9: Graph of P and Q.
The Sturm sequence of P and P 0Q is (X3 − X, 3X3 − X,X), and the Sturm sequence of P and P 0Q2 is (X3−X, 3X4−X2, −X3+X, −2X2, −X). Evaluating νP,P 0Q and νP,P 0Q2 between the roots of P , we find
νP,P 0Q(−2) = 0, νP,P 0Q2 (−2) = 3, 1 1 ν 0 (− ) = 1, ν 0 2 (− ) = 2, P,P Q 2 P,P Q 2 1 1 ν 0 ( ) = 1, ν 0 2 ( ) = 2, P,P Q 2 P,P Q 2
νP,P 0Q(2) = 0, νP,P 0Q2 (2) = 1. 1 1 Indeed, we find that 2 NP,Q(−2, 2) = 2 ((0 − 0) + (3 − 1)) = 1 as we desired. We are now able to compute the number of distinct real roots of a polynomial P in an interval, as well as the number of those such that another polynomial, say Q, is nonzero, or greater than zero. We apply this to a simple construction involving a polynomial P and its derivative P 0. 0 2 If we set Q = (P ) in Theorem 1.1.3, then νP,P 0Q counts the number of distinct real roots c of P such that Q(c) = (P 0(c))2 > 0 minus the number of those such that Q(c) = (P 0(c))2 < 0. For real roots c of P , Q(c) > 0 if and only
21 if c is a simple root of P , (otherwise Q(c) = (P 0(c))2 = 0). Hence by Theorem 1.1.3, for real numbers a < b not roots of P , νP,P 0Q(a) − νP,P 0Q(b) is equal to the number of simple roots of P in the interval (a, b), (remembering that νP,P 0Q does not counts roots of P which are also roots of Q). Now, for a polynomial P ∈ R[X] we define G0(P ) = P , and for k ≥ 1 we define recursively
0 Gk(P ) := gcd (Gk−1(P ),Gk−1(P ) ) .
Claim 1.1.4. The factors F of Gk(Q) of multiplicity d are precisely the factors of Q of multiplicity d + k, where k ≤ d.
Proof. If F is a factor of Gk−1(Q) of multiplicity d ≥ 1, then F is a factor 0 of Gk(Q) = gcd(Gk−1(Q),Gk−1(Q) ) of multiplicity d − 1, (the minimum of 0 the multiplicity of F in Gk−1(Q) and Gk−1(Q) ). If F is a factor of Gk(Q) of 0 multiplicity d − 1 ≥ 1, then F is a factor of both Gk−1(Q) and Gk−1(Q) of 0 multiplicity at least d−1. However, the multiplicity of F in Gk−1(Q) is exactly 1 less than the multiplicity of F in Gk−1(Q). Hence the multiplicity of F in Gk−1(Q) is d. If a factor F in Q has multiplicity d ≥ 1, then the multiplicity 0 of F in G1(Q) = gcd(Q, Q ) is d − 1. If a factor F in G1(Q) has multiplicity d − 1 ≥ 1, then F must be a factor of both Q and Q0 of multiplicity at least d − 1. Therefore F is a factor of Q of multiplicity d. This proves the claim.
2 With the above construction, set Q = (Gk(P )) . Then a real root c of P is also a root of Q if and only if c is a root of P of multiplicity k + 1 or greater. Therefore, for real numbers a < b not roots of P , νP,P 0Q(a) − νP,P 0Q(b) is equal to the number of distinct roots of P in the interval (a, b) with multiplicity k or lower, (that is, not roots of Gk(P )). Finally, replacing P with Gk(P ) in the construction of νP,P 0Q and setting 0 2 Q = (Gk(P ) ) , we arrive at the following result.
Theorem 1.1.5. Let P ∈ R[X], let k ∈ N, and let a < b be real numbers, not roots of P . Denote Gk := Gk(P ). Then the number of real roots of P in the interval (a, b) of multiplicity k + 1 is equal to
ν 0 0 2 (a) − ν 0 0 2 (b). Gk,Gk(Gk) Gk,Gk(Gk)
Proof. The proof is an immediate consequence of the definition of Gk(P ) and the application of Claim 1.1.4 and Theorem 1.1.3. To clarify this, Claim 1.1.4 asserts that the simple roots of Gk(P ) are precisely the roots of P of multiplicity 0 0 0 2 k + 1. Then, by replacing P , P , and Q with Gk(P ), Gk(P ) , and (Gk(P ) ) respectively in the construction of the Sturm sequence (and subsequently in the construction of νP,P 0Q), we can count the number of simple roots of Gk(P ) in 0 an interval of the real line, (since, for a root c of P , Gk(P )(c) 6= 0 if and only if c is a simple root of P ). That is, we can count the number of distinct roots of P of multiplicity k + 1 in that interval.
22 We illustrate this result in the following example. Example 10. Consider the polynomial S = X3(X − 1)2 = X5 − 2X4 + X3, which has a triple root c1 = 0, and a double root c2 = 1. Then we have 0 4 3 2 S = 5X − 8X + 3X . We now construct Sturm sequences of Gk(S) and 0 3 (Gk(S) ) for each k = 0, 1, 2, 3. 0 2 0 2 Setting P = G0(S) = S and Q = (G0(S) ) = (S ) , the Sturm sequence of P and P 0Q is (S, (S0)3, −S). Then we have
νP,P 0Q(−2) − νP,P 0Q(2) = 0.
As expected, 0 simple roots of S were counted. 0 3 2 2 We compute G1(S) = gcd(S, S ) = X − X = X (X − 1), and set P = 3 2 0 2 2 2 4 3 2 G1(S) = X −X and Q = (G1(S) ) = (3X −2X) = 9X −12X +4X . Then the Sturm sequence of P and P 0Q is (X3 −X2,X3(3X −2)3, −(X3 −X2), −X2), and we have νP,P 0Q(−2) − νP,P 0Q(2) = 1. That is, 1 double-root of S was counted. 0 3 2 2 We compute G2(S) = gcd(G1(S),G1(S) ) = gcd(X − X , 3X − 2X) = X, 0 0 2 and G2(S) = 1. Setting P = G2(S) = X, and Q = (G2(S) ) , we easily find that the Sturm sequence of P and P 0Q is (X, 1). Then we have
νP,P 0Q(−2) − νP,P 0Q(2) = 1.
That is, 1 triple-root of S was counted. So far the polynomials we have looked at in examples have been made to be simple and easy to visualize, so as to allow the reader to check computations graphically. This is, of course, not likely to be the case in general. In the above example we know that none of the polynomials may have roots outside of the interval (−2, 2), but such bounds may not be obvious for more complex examples. That is, one does not immediately know how large an interval we must consider in order to ensure that the roots of an arbitrary polynomial lie within this interval. We now introduce a bound on the absolute value of possible roots of an arbitrary polynomial, which we can apply to our results relating to computing with Sturm sequences.
d Lemma 1.1.6. Consider a polynomial P (X) = a0X +...+ad−1X +ad. Then
1 i ai |c| ≤ max d i=1,...,d a0 for all roots c of P .
1 i ai Proof. Consider a point z ∈ such that |z| > maxi=1,...,d d . Then we C a0 i d−i have |a0||z| /d > |ai| for all i = 1, . . . , d. Multiplying both sides by |z| we d d−i find that |a0||z| /d > |aiz | for all i = 1, . . . , d. Adding all d terms together,
23 d−1 d d d we have |a1||z| + ... + |ad| < d(|a0||z| /d) = |a0||z| = |a0z |, and hence, d−1 0 d by the triangle inequality, |a1z + ... + adz | < |a0z |. This shows that the d magnitude of the leading term a0z is strictly larger than the magnitude of d d−1 the remaining terms put together, meaning that a0z + a1z + ... + ad is necessarily nonzero. That is, z is not a root of P .
The above result shows that for any polynomials P,Q ∈ R[X], the functions νP,Q, νP,P 0 , νP,P 0Q, and νP,P 0Q2 are constant on some unbounded intervals (−∞,A) and (B, ∞), A, B ∈ R. Therefore, if we wish to count all possible roots of P in accordance to any of the previous results in this section, we know that there always exists an interval (which we can compute in terms of the coefficients of P ) large enough to contain all real roots of P . Equivalently, since the ν functions are constant outside of such an interval, it is enough to consider ν(−∞) − ν(∞). That is to say, in order to explicitly compute the total number of roots of a polynomial in accordance to any of the previous results in this section, it is enough to consider the degree and sign of the leading term for each entry in the Sturm sequences, since the highest-order terms determine the sign of a polynomial at ±∞. Explicitly, if the Sturm sequence in question is (P0,P1,...,Pk), then the value of the associated ν function at −∞ and ∞ is d0 d equal to the number of sign changes in (lc(P0)(−1) , . . . , lc(Pk)(−1) k ) and (lc(P0), . . . , lc(Pk)) respectively, where lc(P ) denotes the leading coefficient of the polynomial P , and dj denotes the degree of the polynomial Pj, for j = 0, . . . , k. We have considered systems of the form “P = 0 and Q > 0” for polyno- mials P,Q ∈ R[X], and we are able to count the total number of real roots of such a system, (as well as some other related quantities concerning the roots and multiplicities of roots of polynomials). If, instead of a single polynomial equation, we have the system P1 = ... = Ps = 0, where P1,...,Ps ∈ R[X], 2 2 then we can replace P in the usual construction with P1 + ... + Ps , noting that 2 2 P1 + ... + Ps = 0 if and only if Pi = 0 for all i = 1, . . . , s. Hence we can reduce a system of the form “P1 = 0 and ... and Ps = 0 and Q > 0” to the system 2 2 “P := P1 + ... + Ps = 0 and Q > 0”, once again leaving single polynomial equation and inequality. We now generalize to a system of the form
P = 0 and Q1 > 0 and ... and Qr > 0, where P,Q1,...,Qr ∈ R[X]. For convenience, we begin with the assumption that P is relatively prime to each Q1,...,Qr, so that none of the Qi can have r common roots with P . Denote := (1, . . . , r) ∈ {0, 1} , and define the product 1 r Q := Q1 ...Qr . Denote by s the number of distinct real roots c of P such that Q(c) > 0 minus the number of those such that Q(c) < 0. Since Q ∈ R[X] r for any ∈ {0, 1} , we can compute s by simply applying Theorem 1.1.3. Explicitly, by setting Q = Q, we have
s = νP,P 0Q(−∞) − νP,P 0Q(∞).
24 Furthermore, recalling that
NP,Q := [νP,P 0Q (−∞) − νP,P 0Q (∞)] + [νP,P 0Q2 (−∞) − νP,P 0Q2 (∞)] is equal to twice the number of distinct real roots c of P such that Q(c) > 0, we can easily compute the number of distinct real rootsc of P such that Q(c) > 0. r Denote ψ := {ψ1, . . . , ψr} ∈ {0, 1} , and denote by cψ the number of distinct ψi real roots c of P such that sign(Qi(c)) = (−1) for all i ∈ {1, . . . , r}. That is, cψ denotes the number of distinct real roots of P at which each Qi has a sign designated by ψi. For example, if ψ = (0,..., 0) the r-tuple whose entries are all 0, then cψ is the number of distinct real roots of P at which all Qi are positive. Note that while we can compute each s relatively easily, we cannot yet compute the cψ, which is what we desire. Denote by es and ec the vectors r of length 2 whose entries are the s and cψ respectively. We can compute r r the vector ec via es and an invertible 2 × 2 matrix which is independent of the polynomials P and Q1,...,Qr, and depends only on r.
r r Claim 1.1.7. There exists a 2 × 2 invertible matrix, Ar, depending only on r, such that es = Ar · ec. Proof. We prove using induction on r, the number of polynomial inequalities. The case when r = 0 holds trivially, since s = s∅ = c∅ = cψ, where s∅ and c∅ are both equal to the number of distinct real roots of P . Consider the case when r = 1. We have
s0 = νP,P 0Q0 (−∞) − νP,P 0Q0 (∞), which is equal to the number of distinct real roots c of P (those at which Q(c) > 0 plus those at which Q(c) < 0, since Q0(c) is always greater than 0). That is, s0 = c0 + c1. We also have s1 = νP,P 0Q1 (−∞) − νP,P 0Q1 (∞), which is equal to the number of distinct roots c of P such that Q(c) > 0 minus the number of those such that Q(c) < 0. That is,
s1 = c0 − c1.
Therefore when r = 1 we have the 2 × 2 matrix: s0 1 1 c0 s = = · = A1 · c. e s1 1 −1 c1 e
Clearly A1 is invertible. We have proven the claim for r = 1. We now assume that the claim is true for the case of r ≥ 1. That is, we r r assume that there is an invertible 2 × 2 matrix Ar such that es = Ar · ec. If we 1 r have polynomials Q1,...,Qr,Qr+1 write Q = Q1 ...Qr for = (1, . . . , r) ∈
25 r ,r+1 r+1 {0, 1} , and denote by Q the product Q Qr+1 for r+1 ∈ {0, 1}. We define ,r+1 s,r+1 to be the number of distinct real roots c of P such that Q (c) > 0 ,r+1 minus the number of those such that Q (c) < 0, and we define cψ,ψr+1 to ψi be the number of distinct real roots c of P such that sign(Qi(c)) = (−1) for r all i = 1, . . . , r + 1, where ψ = (ψ1, . . . , ψr) ∈ {0, 1} , and ψr+1 ∈ {0, 1}. Now, if we let r+1 = 0, then . . 0 s(,0) = the number of distinct real roots c of P such that Q Qr+1(c) > 0 . minus the number of those such that QQ0 (c) < 0. . r+1 = the number of distinct real roots c of P such that Q (c) > 0 and Qr+1(c) > 0 plus the number of those such that Q (c) > 0 and Qr+1(c) < 0 minus the number of those such that Q (c) < 0 and Qr+1(c) > 0 minus the number of those such that Q (c) < 0 and Qr+1(c) < 0.
= Ar · cψ,0 + Ar · cψ,1 where the first and third terms constitute Ar · cψ,0, and the second and fourth terms constitute Ar ·cψ,1 in the final equality. Similarly, if we let r+1 = 1, then . . 1 s(,1) = the number of distinct real roots c of P such that Q Qr+1(c) > 0 . minus the number of those such that QQ1 (c) < 0. . r+1 = the number of distinct real roots c of P such that Q (c) > 0 and Qr+1(c) > 0 plus the number of those such that Q (c) < 0 and Qr+1(c) < 0 minus the number of those such that Q (c) > 0 and Qr+1(c) < 0 minus the number of those such that Q (c) < 0 and Qr+1(c) > 0.
= Ar · cψ,0 − Ar · cψ,1 where the first and fourth terms constitute Ar · cψ,0, and the second and third terms constitute Ar · cψ,1. Combining these results, we have . . . . s(,0) cψ,0 A A . r r . es = . = · . = Ar+1 · ec. Ar −Ar s(,1) cψ,1 . . . .
Finally, we need to check that Ar+1 is invertible. By the inductive assumption Ar is invertible, and so we can easily verify that −1 −1 1 Ar Ar Ar Ar −1 −1 = 1. 2 Ar −Ar Ar −Ar
26 We are now able to find the values cψ. In particular, we can compute the total number of real roots c of P such that Q1(c) > 0 and ... and Qr(c) > 0, for polynomials P,Q1,...,Qr ∈ R[X]. We illustrate this with an example involving 3 inequalities.
Example 11. Let P = (X + 2)(X − 1)(X − 4), Q1 = (X + 3)(X − 2)(X − 3), Q2 = X − 3, and Q3 = X, and consider the system
P = 0 and Q1 > 0 and Q2 > 0 and Q3 > 0.
The computation of the Sturm sequences is straightforward and will be omitted - the evaluate of each s and cψ will be done by observation.
25 P P Q1 20 Q2 Q3 15
10
5
X −4 −3 −2 −1 1 2 3 4 −5
−10
Figure 10: Graph of P , Q1, Q2, Q3.
Noting that the sum of the entries of ec must be exactly 3, we easily compute the following from the above graph:
s(0,0,0) = 3, s(0,0,1) = 1, s(0,1,0) = −1, s(0,1,1) = 1,
s(1,0,0) = 3, s(1,0,1) = 1, s(1,1,0) = −1, s(1,1,1) = 1,
c(0,0,0) = 1, c(0,1,0) = 1, (0,1,1) = 1. That is,
es = (3, 1, −1, 1, 3, 1, −1, 1) and ec = (1, 0, 1, 1, 0, 0, 0, 0).
27 To clarify what is being calculated, s(0,0,0), for example, is the number of distinct (0,0,0) 0 0 0 real roots of P at which Q = Q1Q2Q3 is greater than zero, minus the number of those at which Q(0,0,0) is less than zero. Of course, Q(0,0,0) = 1 everywhere, so we find that s(0,0,0) is simply equal to the total number of distinct real roots of P (remembering that we are only considering Qi relatively prime to P so far). That is, s(0,0,0) = 3. For another example, consider s(0,1,0), which (0,1,0) 0 1 0 is equal to the number of distinct real roots of P at which Q = Q1Q2Q3 is positive, minus the number of those at which Q(0,1,0) is negative. The sign (0,1,0) of Q is essentially just the sign of Q2, and we observe that Q2 is negative at roots −2 and 1 of P , but is positive at the root 4 of P . That is, s(0,1,0) = 1 − 2 = −1. We can now easily verify the statement of Claim 1.1.7 for this example with
1 1 1 1 1 1 1 1 1 3 1 −1 1 −1 1 −1 1 −1 0 1 1 1 −1 −1 1 1 −1 −1 1 −1 1 −1 −1 1 1 −1 −1 1 1 1 A3 · c = · = = s. e 1 1 1 1 −1 −1 −1 −1 0 3 e 1 −1 1 −1 −1 1 −1 1 0 1 1 1 −1 −1 −1 −1 1 1 0 −1 1 −1 −1 1 −1 1 1 −1 0 1
We must finally consider the case when the P,Q1,...,Qr ∈ R[X] are ar- bitrary, so that P and the Qi may not be relatively prime. To take care of common roots between P and some Qi, we redefine Q in the following way. Take Qr Q2 i=1 i 2−1 2−r Q := = Q1 ...Qr , Q1 ...Qr r where = {1, . . . , r} ∈ {0, 1} as usual. This means that, for roots c of P such that some Qi(c) = 0, we are guaranteed to still have Q (c) = 0 even if i i = 0. (In the previous construction, if i = 0 then Qi is taken to be equal to i 1 everywhere, so that Qi (c) = 1, and c may be counted as a root of P at which Q 6= 0.) The point of this construction is to ensure that roots of P which are also roots of some Qi are not counted by νP,P 0Q . To see this, we simply invoke Theorem 1.1.3 again, with Q in place of Q. Then νP,P 0Q (−∞) − νP,P 0Q (∞) is equal to the number of distinct real roots c of P such that Q(c) > 0 minus the number of those such that Q(c) < 0, and ignoring those such that some Qi(c) = 0. Thus with Claim 1.1.7 and Theorem 1.1.3, (and replacing multiple 2 2 equations P1 = ... = Ps = 0 with P1 + ... + Ps = P = 0 if necessary), we are able to compute the total number of real solutions to a system of the form P = 0 and Q1 > 0 and ... and Qr > 0 for P ∈ R[X] non-constant, and arbitrary Q1,...,Qs ∈ R[X]. So far we have only considered equations and strict inequalities. We are able to replace the system “Q ≥ 0” with the disjunction “Q > 0 or Q = 0”, giving two systems which can be solved separately using the above method.
28 If we have a system consisting only of inequalities, say “Q1 > 0,...,Qr > 0”, we can break it down into two types of solutions. This system is satisfied on an unbounded interval of the form (a, ∞) (or (−∞, a) respectively) if and only if the leading coefficients of Q1(X),...,Qr(X), (or Q1(−X),...,Qr(−X) respectively), are all positive. That is, for polynomials Qi of even degree, the leading coefficient is positive, and for polynomials Qj of odd degree, the leading coefficient is positive in the case of an interval of the form (a, ∞), and negative in the case of an interval of the form (−∞, a). The system “Q1 > 0,...,Qr > 0” is satisfied on some bounded interval, say (a, b), where a, b are real roots of the r Y 0 product Q := Qi, if and only if the system “Q = 0,Q1 > 0,...,Qr > 0” has i=1 a real solution. To see why, assume that all Qi are positive on some bounded interval (a, b), where a and b are real roots of Q. Since a and b are roots of r Y 0 the product Q := Qi, Rolle’s theorem asserts that Q must have a root in i=1 the interval (a, b). Conversely, if the system “Q1 > 0,...,Qr > 0” has no 0 solution in the interval (a, b), then “Q = 0,Q1 > 0,...,Qr > 0” is also not satisfied on the interval. (We need only consider consecutive roots of Q, since if there are real roots of Q inside of the interval (a, b), then at least one of the Qi necessarily changes sign, even if only at the root itself. In addition, with a and b consecutive roots of Q, if the system is satisfied at any point in (a, b), then it is clearly satisfied on the entire interval (a, b).)
29 1.2 Tarski-Seidenberg - Elimination of a variable In this subsection we discuss an algorithm for eliminating a variable when de- termining whether or not a system of polynomial equations and inequalities has real solutions. We consider a system of polynomial equations and inequalities in n + 1 variables, (Y1,...,Yn) = Y , and X, with real coefficients: P1(Y1,...,Yn) B1 0 . S(Y1,...,Yn,X) := . Pr(Y1,...,Yn) Br 0. Denote by lc(P ) the leading coefficient of a polynomial P . A system with “fixed degrees” is a system (S(Y,X), D(Y )) where D(Y ) forces (either implicitly or explicitly) the leading coefficients lc(Pi) to be nonzero for all Pi considered in S(Y,X). To see why a system with fixed degrees is desirable, consider the polynomial P (A, B, X) = AX + B. The system AX + B = 0 has a solution if and only if either statement “A 6= 0” or “A = 0 and B = 0” is satisfied. Indeed, if A is nonzero then P has a root at −B/A, and the first statement is satisfied. Conversely, if A = 0 and B 6= 0 then P is a nonzero constant which has no roots, and we observe that neither of the two statements are satisfied. We will see shortly that each of these cases have a corresponding Sturm sequence, and that for systems with unfixed degrees, these associated Sturm sequences depart from one another precisely when a leading coefficient vanishes in one and not in the other. It is obvious that any system of polynomial equations and inequalities (with or without fixed degrees) is equivalent to a finite disjunction of systems with fixed degrees. It is therefore sufficient to consider only systems with fixed degrees. Theorem 1.2.1 (Tarski-Seidenberg - First Form). There exists an algorithm which, given such a system S(Y,X), produces a finite list of systems of poly- nomial equations and inequalities in Y = (Y1,...,Yn) with real coefficients, say n C1(Y ),..., Cq(Y ), such that, for every point y = (y1, . . . , yn) ∈ R , the system S(Y,X) has a real solution if and only if one of the systems Ci(y) is satisfied. More succinctly, the statement “there exists a real solution, X, such that S(Y,X)” is equivalent to the statement “C1(Y ) or ... or Cq(Y )”, which de- pends only on Y . That is, there is an algorithm for eliminating the real variable, X. Before proving the statement of Theorem 1.2.1 we explore some preliminary results followed by some examples to elucidate the details of each step of the algorithm, as well as to clarify the overall structure. We introduce the sign function, sign : R → {−1, 0, 1}, defined by {+1} if x > 0 sign(x) := {0} if x = 0 {−1} if x < 0. Recall that we only need to consider systems with equalities “=” and inequalities “>”, since other relations can be expressed in terms of these, and recall that
30 multiple equalities P1 = ... = Ps = 0 can be expressed as a single equality, P = 2 2 P1 +...+Ps = 0. Combining this with the fact that any system can be expressed a finite disjunction of systems with fixed degrees, we only need to consider systems with fixed degrees containing a single polynomial equality P = 0 and polynomial inequalities Q1 > 0,...,Qr > 0. For polynomials P,Q1,...,Qr in n variables (Y1,...,Yn,X), and for a point y = (y1, . . . , yn) ∈ R , let D(y) denote the statement “lc(P (y)) 6= 0 and lc(Q1(y)) 6= 0 and ... and lc(Qr(y)) 6= 0”. Lemma 1.2.2. There exists an algorithm which, given a family of real polyno- mials (P,Q1,...,Qr) in variables (Y1,...,Yn,X) of positive degree with respect to X, produces a finite list of polynomials (R1,...,Rl) in (Y1,...,Yn), and a l function c : {−1, 0, 1} → N such that, for every l-tuple of signs = (1, . . . , l) ∈ {−1, 0, 1}l and every point y ∈ Rn satisfying the statement 00 “D(y) and sign(R1(y)) = 1 and ... and sign(Rl(y)) = l , the system
“P (y, X) = 0 and Q1(y, X) > 0 and ... and Qr(y, X) > 0” has exactly c() solutions. Proof. First, we compute the Sturm sequences as in Claim 1.1.7. That is, for r each δ = (δ1, . . . , δr) ∈ {0, 1} , we compute the Sturm sequence of P and δ δ1 δr Q = Q1 ...Qr as we aim to compute ec, the vector whose entries are cψ (the number of real roots of P such that sign(Qi) = ψi for all i = 1, . . . , r). For every polynomial in a Sturm sequence, we test whether its leading coefficient is zero. In the case that the leading coefficient of a polynomial is zero, we re- place the polynomial with its truncation (deleting the leading term from the polynomial). We assume that the leading coefficients of P,Q1,...,Qr are all nonzero, ensuring that all Sturm sequences begin with non-constant polynomi- als (as required by a Sturm sequence). This yields a tree of Sturm sequence computations, where the branching tests are polynomial equations/inequations (“= 0” or “6= 0”) in the parameters Y = (Y1,...,Yn). Thus every branch corre- sponds to a system of equations and inequations in Y , and we obtain the Sturm sequence corresponding to all parameters y ∈ Rn satisfying the system. From Lemma 1.1.6 we know that the signs of the leading coefficients of the polynomials in each Sturm sequence determine the value of ν(−∞) − ν(+∞). In particu- lar, these leading coefficients are rational fractions (as a result of performing Euclidean division in computing the Sturm sequences), say A(Y )/B(Y ), where B(Y ) is assumed to be nonzero within the branch in which it occurs. Notice that A(Y )B(Y ) has the same sign as A(Y )/B(Y ). We define the R1,...,Rl to be these polynomials A(Y )B(Y ) for all leading coefficients A(Y )/B(Y ) of polynomials from all branches of all trees of Sturm sequence computations. Finally, assuming that D(y) is satisfied and fixing the sign of each of the R1(y),...,Rl(y), applying the results of Lemma 1.1.6 and Claim 1.1.7 to the R1,...,Rl, we are able to compute the number of real solutions of the sys- tem “P (y, X) = 0 and Q1(y, X) > 0 and ... and Qr(y, X) > 0”. That is, in
31 terms of Claim 1.1.7, we use the signs of the leading coefficients to compute the vector es whose entries are sδ = νP,P 0Qδ (−∞) − νP,P 0Qδ (+∞), which we can use to compute c. The first entry of c is c = c , which precisely e e (ψ1,...,ψr ) (0,...,0) corresponds to the number of solutions to the system, as required. We give a detailed examination of the algorithm in the following example. Example 12. Let P = X2 + aX + b and Q = X + c, and consider the system “P = 0 and Q > 0”. We begin by computing Sturm sequences. Since there is only a single polynomial inequality, our r-tuple δ = (δ1, . . . , δr) is just δ = δ1. We are considering P and Q in full generality, and so we cannot assume that they are relatively prime - they may have common roots depending on (a, b, c). Particularly, we will use Claim 1.1.7, allowing for the possibility that P and Q have common roots. We define