<<

Semialgebraic Geometry School of and Statistics, University of Sydney A thesis submitted in fulfillment of the requirements for the degree of Master of Philosophy

Mark S. Perrin 2020

1 Abstract In this thesis we introduce the preliminary results required to appreciate some key results of classical semialgebraic geometry. Namely, we give a detailed account of Sturm’s root counting method, and Hermite’s root counting method - both are used to count the number of real solutions of a finite system of polynomial equations in a single variable. After building a foothold in both the theoretical and algorithmic setting of the root counting methods, we extend to systems of polynomial equations and inequations, wherein lies the bridge to the definition of semialgebraic sets as a natural extension to algebraic sets. The main results covered are the cylindrical algebraic decomposition and triangulation of semial- gebraic sets, as well as Hardt’s semialgebraic triviality of semialgebraic sets, where we show numerous consequences of each of these. We give the proofs of important concepts with a focus on intuitive exemplification and illustration. In the final section we discuss some improvements to the implementation of the standard conditions of Thom’s lemma, 3.4.1, which may have implications on the efficiency of the projection operation used in cylindrical algebraic decomposition.

2 Acknowledgments

I want to express my gratitude toward Laurentiu for his persistence and patience as my supervisor throughout my degree and writing of this thesis. I appreciate his diligence as a mentor - leading me with suggestions to seemingly independent personal discoveries and first-hand revelations in new topics in mathematics, and teaching me to always ask questions - which has undoubtedly improved my ability to learn. I also want to thank my family for the support relating to my studies or otherwise. They have always encouraged my pursuits in every way they can, with full faith that I will try my best to succeed, or at least make the most of the cards that might be dealt. Finally, I want to thank Gwen for being so supportive during our time to- gether including my studies, for being understanding in personal times, and for giving me motivation when it is most needed.

3 Contents

1 Counting real roots of polynomials 7 1.1 Sturm’s method ...... 7 1.2 Tarski-Seidenberg - Elimination of a variable ...... 30 1.3 Hermite’s method ...... 37

2 Semialgebraic sets 39 2.1 Semialgebraic sets - Definitions and examples ...... 39 2.2 Tarski-Seidenberg - Second form ...... 41 2.3 Tarski-Seidenberg - Third form ...... 42 2.4 Semialgebraic functions ...... 43

3 Decomposing semialgebraic sets 54 3.1 Cylindrical algebraic decomposition ...... 54 3.2 Constructing a c.a.d. adapted to a finite family of polynomials . 59 3.3 Algorithmic construction of an adapted c.a.d...... 78 3.4 Improved cylindrical algebraic decomposition ...... 79 3.5 Dimension of semialgebraic sets ...... 101 3.6 Triangulation of semialgebraic sets ...... 110

4 Hardt’s trivialization and consequences 123 4.1 Semialgebraic triviality of semialgebraic sets ...... 123 4.2 Semialgebraic Sard’s theorem and upper bounds on connected components ...... 136 4.3 Algebraic computation trees and lower bounds on connected com- ponents ...... 148

5 Implementing the (?) condition 154 5.1 Checking the (?) condition ...... 154 5.2 Completing a family to satisfy the (?) condition ...... 161

A Hermite’s Method 166

4 Introduction

The goal of this thesis is to give a detailed discussion of several key results in semialgebraic geometry, emerging in the 1970’s and onward. Of course, much of the underlying theory is older, relying on knowledge of and analysis. We aim to give the new reader an introduction to the topic, providing rigorous and detailed proofs, complemented by an incremental building of the relevant theory, and illustrated with many worked examples. Following each of the main results (Theorems 3.4.7, 3.4.9, 3.6.1, and 4.1.1), we outline various consequences concerning the dimension, and the number of connected compo- nents of semialgebraic sets. Many of the results (and their proofs) we cover are taken from a preprint of Michel Coste’s ‘Introduction to Semialgebraic Geom- etry’ (2002), [1]. The structure of the paper reflects the logical dependence of the results, and hence the order in which the material should be learned. In Section 1 the root counting methods of Sturm and Hermite are introduced, providing a way to count the number of real solutions of a system of finitely many polynomial equations in a single variable. Sturm’s method is developed further to count the number of solutions of a system consisting of a polynomial equation and several polynomial inequalities, and Hermite’s method is devel- oped to relate the principal subresultant coefficients of two polynomials with the multiplicities of their common factors. This naturally gives a precursor for the definition of semialgebraic sets, as well as providing some powerful tools for their detailed examination. However, the reader may start with semialgebraic sets in Section 2, and return to the referenced material from Section 1 when necessary. The basic properties of semialgebraic sets are introduced in Section 2, particularly stability properties, as well as the tools commonly used to study semialgebraic sets. Semialgebraic sets are, by definition, sets of points in Rn satisfying finite boolean combinations of sign conditions on a family of polyno- mials, and being able to count the solutions of such a system is crucial in the method of cylindrical algebraic decomposition of semialgebraic sets into simpler, well-arranged we call strata, or slightly more generally, cells. The reader will become familiar with taking projections of semialgebraic sets, which is used extensively throughout Section 3 as one of the primary methods used to study semialgebraic sets, and is responsible for some of the main features, including the cylindrical arrangement of cells, in a constructed cylindrical algebraic de- composition. Decomposing semialgebraic sets in this way provides information on the topology and dimension of the sets, and allows for the proof that every semialgebraic set can be triangulated, (Theorem 3.6.1). In Section 4 we intro- duce the notion of semialgebraic triviality in order to present Hardt’s theorem as the next main result. We use the triangulation theorem to prove Hardt’s the- orem, 4.1.1, on the triviality of semialgebraic sets, which states that the image of a semialgebraic set S by a continuous semialgebraic mapping h can be repre- sented as a finite union of subsets, over which the mapping is semialgebraically trivial, and that this triviality can be refined to be compatible with a finite collection of semialgebraic subsets of S. Hardt’s theorem has consequences on the structure, dimension, and number of topological types of semialgebraic sets,

5 and is used to prove the semialgebraic version of Sard’s theorem. Throughout, we aim to give illustrations and geometric exemplification of the key concepts in understanding these results. Some of the propositions/corollaries stated and proven throughout this thesis are posed as exercises in [1] (Coste).

6 1 Counting real roots of polynomials

In this section we explore two methods for counting the real roots of polyno- mial equations with 1 variable. Namely, we consider Sturm’s method, and the method of Hermite. Sturm’s method invokes a variation of the Euclidean divi- sion algorithm to produce a sequence of polynomials, from which we are able to infer the number of distinct real roots of a polynomial on an open (not nec- essarily bounded) interval of the real line. We show how to extend this method to count the number of distinct real roots of a system of polynomial equations and inequalities in a single variable, and how to do so algorithmically, which will be used in Section 3 as a foundation for decomposing semialgebraic sets in an effective way. Hermite’s method uses Newton sums, and principal minors of a matrix related to the Newton sums, to count the distinct complex roots of a polynomial, and even the distinct real roots of a polynomial. We show how to modify the setup in order to compute the number of solution of a system consist- ing of a polynomial equation and a polynomial in a single variable. This method invokes Galois theory, principal subresultant coefficients, and a theorem of Jacobi on symmetric bilinear forms. We also show how the princi- pal subresultant coefficients of pairs of polynomials relate to common roots of these polynomials, which is an important feature we use in a construction in Section 3. While they are related, (see [2]), the methods of Sturm and Hermite are quite different in their application. Sturm’s method requires branching of computation trees based on the signs of certain polynomials in the coefficients we start with, while Hermite’s method avoids branching altogether.

1.1 Sturm’s method

Consider two nonzero polynomials P,Q ∈ R[X]. We construct a sequence of polynomials P0,P1,...,Pk by taking P0 := P , P1 := Q, and for i > 0 we define Pi+1 as the negative of the remainder of the Euclidean division of Pi−1 by Pi. That is, Pi−1 = PiAi + Ri for some Ai,Ri ∈ R[X], and we define Pi+1 := −Ri. From this definition we can write Pi−1 = PiAi − Pi+1, or Pi+1 = PiAi − Pi−1 equivalently. We now define the Sturm sequence of P and Q as (P0,P1,...,Pk), where Pk is the last nonzero polynomial in the sequence. Note that Pk = ± gcd(P,Q), since the process for computing Pk is the Euclidean algorithm up to a sign change. Example 1. Let P = X3 + 2X2 + 4, Q = X2 + 4X − 2. Then we have

P = Q · (X − 2) + (10X) =⇒ P2 = −10X, 1 Q = P · (− X) + (4X − 2) =⇒ P = −4X + 2, 2 10 3 10 P = P · ( ) − 5 =⇒ P = 5. 2 3 4 4

7 Then the Sturm sequence of P and Q is

3 2 2 (P0,...,P4) = (X + 2X + 4,X + 4X − 2, −10X, −4X + 2, 5).

8 P P0 P1 6 P2 P3 4

2

X −5 −4 −3 −2 −1 1 2 3 4 5 −2

−4

−6

Figure 1: Graph of P0,...,P3

The above choice of P and Q was such that they had no common roots, (and neither had multiple roots). As such, the last term in the sequence, P4 = 5, is constant. We consider an example where this is not the case. Example 2. Let P = (X − 1)2(X − 4) = X3 − 6X2 + 9X − 4, and Q = (X − 1)(X − 2) = X2 − 3X + 2. Then

P = Q · (X − 3) − (2X − 2) =⇒ P2 = 2X − 2, 1 Q = P · ( X − 1) =⇒ P = 0. 2 2 3 Thus the Sturm sequence of P and Q is

3 2 2 (P0,P1,P2) = (X − 6X + 9X − 4,X − 3X + 2, 2X − 2).

One notices that P has a double root at X = 1, and a simple root at X = 4, while Q has only simple roots at X = 1 and X = 2. In particular, we notice that P and Q have a common root at X = 1. As a result of this, the last term of the sequence, P2 = 2(X − 1), is non-constant, and is indeed equal to gcd(P,Q) (up to some nonzero scalar multiple).

8 We now look at the sign changes in the sequence (P0(X),...,Pk(X)) as X varies over R. Denote by νP,Q(x) the number of sign changes in the sequence (P0(x),...,Pk(x)), the Sturm sequence of P and Q evaluated at x. For instance, with P and Q as in Example 1 we have (P0(1),...,P4(1)) = (7, 3, −10, −2, 5), so we find that νP,Q(1) = 2. In the case that some Pi(x) is zero, we simply ignore that entry, so we have (P0(0),...,P4(0)) = (4, −2, 0, 2, 5), with νP,Q(0) = 2. We want to examine how νP,Q changes as X varies. Observe that the value of νP,Q(X) can only possibly change if some of the Pi in the Sturm sequence change sign. That is, νP,Q(X) can only possibly change as X moves over roots of the polynomials Pi in the Sturm sequence. Hence νP,Q is constant on the open intervals between the roots of all Pi in the sequence. This means that we only need to check the value of νP,Q(X) at one point in each of these open intervals to understand νP,Q, at least schematically. Thus we are prompted to investigate what happens as X moves over such roots. Remark: A particular case of the νP,Q is used in [1], where Q is 0 taken to be P , and νP is used to denote what we would call νP,P 0 , omitting the second argument when it is the derivative of P . As we approach Sturm sequences in greater generality, we use the notations νP,P 0 and νP,Q to avoid ambiguity at all stages. We aim to understand the function νP,Q itself to gain a deeper understanding of what it can tell us, and why results such as Sturm’s Theorem 1.1.2 occur. We explore the behavior of νP,Q in the following example before we begin examining its properties rigorously. 4 2 2 Example 3. Let P = X −6X , Q = X +X−2. Then P2 = −X+6, and P3 = 4 2 2 −40, and the Sturm sequence√ √ of P and Q is (X −6X ,X +X−2, −X+6, −40). P has roots at X = − 6, 0, 6, Q has roots at X = −2, 1, and P2 has a root at X = 6.

9 8 P P0 6 P1 P2 4

2 X −5 −4 −3 −2 −1 1 2 3 4 5 −2

−4

−6

−8

Figure 2: Graph of P0, P1, P2.

We evaluate νP,Q on the open intervals between distinct roots of each Pi:

νP,Q(−3) = number of sign changes in (135, 4, 9, −40) = 1, √ νP,Q(− 5) = number of sign changes in (−5, 0.76, 8.2, −40) = 2,

νP,P 0 (−1) = number of sign changes in (−5, −3, 7, −40) = 2, 1 ν ( ) = number of sign changes in (−1.4, −1.25, 5.5, −40) = 2, P,Q 2

νP,Q(2) = number of sign changes in (−8, 4, 4, −40) = 2,

νP,Q(3) = number of sign changes in (27, 10, 3, −40) = 1,

νP,Q(7) = number of sign changes in (2107, 54, −1, −40) = 1. From now on we will omit the intermediate calculations when computing the value of νP,Q. Observe that νP,Q(X) indeed only changes as X passes over roots of P . However, νP,Q(X) does not change as√X passes√ over all roots of P . Notice that νP,Q(X) changes as X moves over − 6 and 6 (both of which are simple roots of P ), but does not change as X moves over 0, which is a double root of P . To see why, we highlight the fact that P is negative on both intervals (−, 0) and (0, +) for suitably small  > 0. Of course, all other terms of the sequence also have constant sign about X = 0. Thus there is no difference in the number of sign changes in the Sturm sequence evaluated at − compared to +. That is, νP,Q(−) = νP,Q(+) for suitably small  > 0. We investigate this further.

10 Consider polynomials P,Q ∈ R[X], and the Sturm sequence P0,...,Pk) of P and Q. For convenience, we will first assume that P and Q have no common roots. We know that Pk will therefore be a nonzero constant. By the construction of a Sturm sequence, we have

Pi+1(c) = Pi(c)Ai(c) − Pi−1(c).

If Pi(c) = 0 for some 1 ≤ i < k, then

Pi+1(c) = −Pi−1(c).

Hence if Pi+1(c) = 0 as well, then Pi+1(c) = −Pi−1(c) = 0, and it is clear that all P0(c), ..., Pk(c) are in fact forced to be zero. However, since P and Q are assumed to be relatively prime we know that Pk is a nonzero constant. This means that we cannot have consecutive terms of the Sturm sequence si- multaneously equal to zero. Therefore, if Pi(c) = 0 for some 1 ≤ i < k, we have Pi+1(c) = −Pi−1(c) 6= 0. Hence, whether Pi(X) changes sign or not as X passes over the root c, the sub-sequence (...,Pi−1,Pi,Pi+1,...) has the same number of sign changes on the interval (c − , c) as it does on the interval (c, c + ), for suitably small  > 0. That is, the roots of P1,...,Pk do not contribute any change to the number of sign changes in the Sturm sequence. Therefore, we need only consider the roots of P0 = P . If P (c) = 0 for some c ∈ R, then either Q > 0 on (c − , c + ), or Q < 0 on (c − , c + ). Assume that Q > 0 on (c − , c + ). Then either

1. P (c − ) < 0 and P (c + ) > 0, giving νP,Q(c − ) − νP,Q(c + ) = 1, or

2. P (c − ) > 0, and P (c + ) < 0, giving νP,Q(c − ) − νP,Q(c + ) = −1, or

3. P (c − )P (c + ) > 0, in which case νP,Q(c − ) − νP,Q(c + ) = 0. We have a similar result when Q < 0 on (c − , c + ). In summary, if the sign of P changes from −sign(Q) to sign(Q) then the number of sign changes in the Sturm sequence goes down by one, hence νP,Q(c−)−νP,Q(c+) = 1, and if the sign of P changes from sign(Q) to −sign(Q) then the number of sign changes in the Sturm sequence goes up by one, hence νP,Q(c − ) − νP,Q(c + ) = −1. Therefore, for a, b ∈ R not roots of P , with a < b, the value of νP,Q(a)−νP,Q(b) is equal to the number of distinct roots c of P in (a, b) such that Q(c) > 0 minus the number of those such that Q(c) < 0. We consider two examples now to illustrate this result. 2 Example 4. Let P = (X − 1)(X + 1) = X − 1, Q = X − 2. Then P2 = −3, and the Sturm sequence of P and Q is (X2 − 1,X − 2, −3).

11 8 P P0 6 P1 P2 4

2 X −5 −4 −3 −2 −1 1 2 3 4 5 −2

−4

−6

−8

Figure 3: Graph of P0, P1, P2.

We evaluate νP,Q on each open interval between distinct roots of each Pi:

νP,Q(−2) = 1,

νP,Q(0) = 0,

νP,Q(3/2) = 1,

νP,Q(3) = 1.

Observe that νP,Q(X) changes as X passes roots of P . More specifically, νP,Q decreases by 1 as X passes over −1, since Q < 0 on (−1 − , −1 + ), P > 0 on (−1 − , −1), and P < 0 on (−1, −1 + ). Similarly, νP,Q increases by 1 as X passes over +1. Of course, we also see that νP,Q does not change as X passes roots of Q.

2 Example 5. Let P = (X − 1)(X + 1) = X − 1, Q = X + 2. Then P2 = −3, 2 and the Sturm sequence of P and Q is (X − 1,X + 2, −3). Evaluating νP,Q between each of the distinct roots of the terms Pi we find:

νP,Q(−3) = 1, 3 ν (− ) = 1, P,Q 2

νP,Q(0) = 2,

νP,Q(2) = 1.

12 6 P P0 P1 P 4 2

2

X −5 −4 −3 −2 −1 1 2 3 4 5

−2

−4

Figure 4: P0, P1, P2.

As in the previous example, we observe that νP,Q(X) changes as X passes roots of P , decreasing by 1 when the sign of P changes from the opposite sign to Q to the same sign as Q, and increasing by 1 when the sign of P changes from the same sign as Q to the opposite sign.

We now consider arbitrary polynomials P,Q ∈ R[X], so that P and Q may no longer be relatively prime. Constructing the Sturm sequence of P and Q in the same way as before with P0 = P and P1 = Q, the last nonzero term Pk = ± gcd(P,Q) is not necessarily a constant, and therefore may have real roots. Construct the modified Sturm sequence P Q P ( , ,..., k−1 , 1), Pk Pk Pk diving each term of the usual Sturm sequence of P and Q by Pk. Writing P = T0Pk and Q = T1Pk, we have

P2 = (T1Pk)A1 − T0Pk = Pk(T1A1 − T0).

Applying induction we have

Pi+1 = PiAi − Pi−1 = (TiPk)Ai − Ti−1Pk = Pk(TiAi − Ti−1), showing that each term in the modified Sturm sequence ( P , Q ,..., Pk−1 , 1) Pk Pk Pk is indeed polynomial, since each term P0,...,Pk−1 is divisible by Pk. The above induction also shows that the Sturm sequence generated by P and Q is Pk Pk

13 precisely the same as the modified Sturm sequence ( P , Q ,..., Pk−1 , 1). Since Pk Pk Pk P and Q are relatively prime we know that they cannot simultaneously be Pk Pk zero. Hence there can never be two consecutive terms in this modified Sturm sequence which vanish simultaneously, and so we retrieve the desired property that if some Pi (c) = 0 for 1 ≤ i < k, then Pi−1 (c) = − Pi+1 (c) 6= 0. Hence Pk Pk Pk the sub-sequence (..., Pi−1 , Pi , Pi+1 ,...) contributes no change to the number Pk Pk Pk of sign changes in the modified Sturm sequence as X moves over roots Pi for Pk 1 ≤ i < k. We consider the real roots of P with multiplicities. We write P and Q m1 mr in terms of the roots of P so that P (X) = (X − a1) ... (X − ar) and n1 nr Q(X) = (X − a1) ... (X − ar) A(X), where A(X) and P (X) are relatively prime, and noting that the multiplicities ni of roots ai in Q may be zero. Then pi pr we have Pk(X) = (X − ai) ... (X − ar) , where pi := min(mi, ni). For a P root ai of P , if mi ≤ ni then ai is not a root of , since pi ≥ mi. Therefore Pk the number of sign changes in the modified Sturm sequence does not change as X passes over such roots of P . Hence we only need to consider roots of P for which mi > ni. P Let ai be a real root of P , and assume that mi > ni. Then ai is a root of Pk Q of multiplicity mi − pi = mi − ni, and ai is not a root of . One notices that Pk mi + ni and mi − ni have the same parity. If mi + ni is even, then ai is a root of P P of even multiplicity. This means that has the same sign on (ai − , ai) as Pk Pk it does on (ai, ai +), and therefore contributes no change to the number of sign P Q changes in the Sturm sequence of and . If mi + ni is odd, then ai is a root Pk Pk P P P of of odd multiplicity. Hence changes sign as X passes over ai. If has Pk Pk Pk Q Q the opposite sign to on (ai − , ai) and the same sign as on (ai, ai + ), Pk Pk then the number of sign changes in the modified Sturm sequence decreases by 1 as X passes over ai. That is,

ν P , Q (ai − ) − ν P , Q (ai + ) = 1. Pk Pk Pk Pk

Conversely, if P changes from the same sign as Q to the opposite sign, then Pk Pk the number of sign changes in the modified Sturm sequence increases by 1 as X passes over ai, and so

ν P , Q (ai − ) − ν P , Q (ai + ) = −1. Pk Pk Pk Pk

We now wish to show that the number of sign changes in (P, Q, . . . , Pk) is precisely the same as the number of sign changes in the modified Sturm sequence ( P , Q ,..., Pk−1 , 1) when evaluated at points which are not roots of P . We note Pk Pk Pk P Q that the relative sign changes of and about the root ai are the same as Pk Pk the relative sign changes of P and Q about ai, as in each of the situations above. Furthermore, observe that whether Pk(x) is positive or negative, dividing each term in the Sturm sequence by Pk(x) clearly does not alter the number of sign changes in the sequence, and so does not affect the value of ν, keeping in mind

14 that Pk(x) 6= 0 when P (x) 6= 0. That is, for points x which are not roots of P ,

νP,Q(x) ≡ ν P , Q (x). Pk Pk We now have the following theorem.

Theorem 1.1.1. Let P,Q ∈ R[X], and let a, b ∈ R not roots of P , with a < b. Denote by νP,Q(x) the number of sign changes in the Sturm sequence of P and Q evaluated at x. Then the value of νP,Q(a) − νP,Q(b) is equal to the number of distinct roots c of P in the interval (a, b) such that the multiplicity of c as a root of P is strictly greater than the multiplicity of c as a root of Q with P (c − )Q(c − ) < 0 and P (c + )Q(c + ) > 0 minus the number of those such that P (c − )Q(c − ) > 0 and P (c + )Q(c + ) < 0.

To summarize the above theorem intuitively, νP,Q(a) − νP,Q(b) counts the number of distinct real roots of P in an interval such that the sign of P changes to the sign of Q over the root, minus the number of those such that P changes away from the sign of Q over the root, but does not count any of the roots in the interval such that their multiplicity as a root of Q is equal to or greater than their multiplicity in P . We include an example to illustrate the above theorem. 1 2 5 4 3 Example 6. Let P = 10 (X + 3)(X )(X − 2)(X − 4) = X − 3X − 10X + 24X2, Q = X − 2. Since Q divides P , the Sturm sequence of P and Q is 1 5 4 3 2 ( 10 (X − 3X − 10X + 24X ),X − 2).

10 P P 8 P1

6

4

2 X −4 −3 −2 −1 1 2 3 4 5 −2

−4

−6

−8

−10

Figure 5: Graph of P and P1.

15 Computing the value of νP,Q between each of the distinct roots of P and Q we find: νP,Q(−4) = 0,

νP,Q(−1) = 1,

νP,Q(1) = 1,

νP,Q(3) = 1,

νP,Q(5) = 0. The nature of each of the distinct roots of P (in terms of relative sign changes to Q) are all different. Particularly, as X moves over the root c1 = −3 of P , the sign of P changes from the same sign as Q to the opposite sign. We also see that νP,Q(−4) − νP,Q(−1) = −1 in correspondence with these relative sign changes, (making sure that c1 = −3 is the only root of P in the interval (-4, -1), of course). As X moves over both c2 = 0 and c3 = 2, both roots of P , there are no relative sign changes between P and Q,(c2 is a double root of P , which is not a root of Q, while c3 is a simple root of both P and Q). Correspondingly, νP,Q(1) − νP,Q(1) = 0 and νP,Q(1) − νP,Q(3) = 0. Finally, as X moves over the root c4 = 4 of P , the sign of P changes from the opposite sign, to the same sign as Q, and indeed we observe that νP,Q(3) − νP,Q(5) = 1. We apply Theorem 1.1.1 to the special case of Q = P 0. (As stated at the introduction of the function νP,Q, the special case νP,P 0 is used in [1], but is simply denoted by νP .) We know that a root ai of P with multiplicity mi is also 0 a root of P with multiplicity mi −1, (where a multiplicity of 0 means it is not a root). The sum of these multiplicities mi +(mi −1) is clearly always odd. Hence, according to Theorem 1.1.1, the function νP,P 0 is monotone-decreasing. More precisely, νP,P 0 is constant on the open intervals between the roots of P , with a discontinuity at each root of P , where the value decreases by 1. Therefore, the value of νP,P 0 (−∞) − νP,P 0 (x) is monotone-increasing with x. If c is a root of P , then νP,P 0 (c) = νP,P 0 (c − ) − 1 for  > 0 sufficiently small, meaning that the value of νP,P 0 (x) is constant on an interval of the form (c − , c), and on an interval of the form [c, c + ).

νP,P 0

c

Figure 6: The function νP,P 0 is monotone decreasing, with a discontinuity at roots of P .

16 We need only consider the open intervals since we can simply evaluate νP,P 0 at points that are not roots of P . Hence we obtain the following theorem as a corollary to Theorem 1.1.1.

Theorem 1.1.2 (Sturm’s Theorem). Let P ∈ R[X], and let a, b ∈ R not roots of P , with a < b. Denote by νP,P 0 (x) the number of sign changes in the Sturm 0 sequence of P and P evaluated at x. Then the value of νP,P 0 (a) − νP,P 0 (b) is equal to the number of distinct roots of P in the interval (a, b). Proof. We first note that all roots c of P are roots of P 0 with lower multiplicity. Thus in the context of Theorem 1.1.1, all real roots in the interval (a, b) will be counted (νP,Q will either increase or decrease as x passes over them). We also note that the sign of P is the opposite of the sign of P 0 on some interval (c−, c), while they have the same sign on some interval (c, c + ), for all real roots c of P . To see this, observe the behavior of P and P 0 about roots of P of even multiplicities and odd multiplicities separately. Particularly, if P is decreasing (increasing) on (c − , c), then P must be positive (negative) on (c − , c), and P 0 must be negative (positive) on this interval. Furthermore, if P is decreasing (increasing) on (c, c+), then P must be negative (positive) on (c, c+), and P 0 must be negative (positive) on this interval. Applying Theorem 1.1.1, we have that νP,P 0 (a) − νP,P 0 (b) is precisely equal to the number of distinct roots of P in the interval (a, b).

Example 7. Let P = X2(X − 2) = X3 − 2X2, so that P 0 = 3X2 − 4X. Then 8 P2 = 9 X. Computing the value of νP,P 0 between each of the distinct roots of P we find: νP,P 0 (−1) = 2,

νP,P 0 (1) = 1,

νP,P 0 (3) = 0.

17 10 P P 8 P 0 P2 6

4

2 X −4 −3 −2 −1 1 2 3 4 5 −2

−4

−6

−8

−10

0 Figure 7: Graph of P , P , P2.

Observe that νP,P 0 decreases by 1 over each distinct root of P , regardless of the multiplicity of the root.

According to the above theorem, νP,P 0 counts the real roots of a polynomial P ∈ R[X] in an interval. This is owing to the fact that the relative sign changes of a polynomial P and P 0 about the real roots of P are consistent, regardless of the polynomial in question. A subtle fact contributing to this is that for a root c of a polynomial P of multiplicity m, we can consider c as a root of P 0 of multiplicity n = m − 1 (possibly zero), meaning that m + n = 2m − 1, which is always odd. We are able to make use of these basic features in order to count the real roots of a polynomial P , either positively or negatively, according to the sign of a second polynomial, say, Q ∈ R[X], at these roots of P , thus giving us an opening into solving polynomial systems of the form “P = 0 and Q > 0”.

Theorem 1.1.3. Let P,Q ∈ R[X], and let a, b ∈ R not roots of P , with a < b. Denote by νP,P 0Q(x) the number of sign changes in the Sturm sequence of P 0 and P Q evaluated at x. Then the value of νP,P 0Q(a)−νP,P 0Q(b) is equal to the number of distinct roots c of P in the interval (a, b) such that Q(c) > 0 minus the number of distinct roots c of P in the interval (a, b) such that Q(c) < 0. Proof. We already know that for a general Sturm sequence, ν is unaffected by roots of all trailing terms P1,...,Pk of the sequence, and is only affected by the leading term, P0. Consider a real root c of P with multiplicity, say, m. We also consider c as a root of both P 0 and Q of multiplicities m − 1 and n respectively, (both possibly zero), so that c is a root of P 0Q of multiplicity m−1+n. Clearly,

18 if n ≥ 1 then m − 1 + n ≥ m. That is, if c is indeed a root of Q, then c is a root of P 0Q of multiplicity at least as great as the multiplicity of c as a root of P . Hence, by Theorem 1.1.1, the value of νP,P 0Q is unaffected by the roots of P which are also roots of Q. We now consider roots c of P which are not roots of Q, so that c is a root of P 0Q of multiplicity m − 1 + n = m − 1 < m. Recall that the relative signs of P and P 0 about c are such that PP 0 < 0 on some (c − , c), and PP 0 > 0 on some (c, c + ), and that Q has constant (nonzero) sign about c. Therefore, on some interval (c − , c),

sign(PP 0Q) = sign(PP 0)sign(Q) = −sign(Q), and on some interval (c, c + ),

sign(PP 0Q) = sign(PP 0)sign(Q) = sign(Q).

That is, if Q(c) > 0, then P changes from the opposite sign of P 0Q on (c − , c), to the same sign as P 0Q on (c, c + ). Conversely, if Q(c) < 0, then P changes from the same sign as P 0Q on (c − , c), to the opposite sign of P 0Q on (c, c + ). Applying Theorem 1.1.1, it is clear that νP,P 0Q(a) − νP,P 0Q(b) is equal to the number of distinct roots c of P in the interval (a, b) such that Q(c) > 0 minus the number of those such that Q(c) < 0.

Remark: We bring attention to the fact that νP,P 0Q does not count roots c of P for which Q(c) = 0. This is illustrated in the following example, along with the statement of Theorem 1.1.3. Example 8. Let P = X(X −2)(X +2) = X3 −4X, so that P 0 = 3X2 −4, and 0 3 8 let Q = X. Then we have P1 = P Q = 3X −4X, and so P2 = 3 X. Computing the value of νP,P 0Q between each of the distinct roots of P we find:

νP,P 0Q(−3) = 0,

νP,P 0Q(−1) = 1,

νP,P 0Q(1) = 1,

νP,P 0Q(3) = 0. Note that the graph of Q itself has not been included in the following figure for simplicity.

19 10 P P 8 P 0Q P2 6

4

2 X −4 −3 −2 −1 1 2 3 4 5 −2

−4

−6

−8

−10

0 Figure 8: Graph of P , P Q, P2.

Observe that Q(−2) < 0 and Q(2) > 0. Indeed we find that νP,P 0Q(−3) − νP,P 0Q(−1) = −1 and νP,P 0Q(1) − νP,P 0Q(3) = 1 in correspondence with the sign of Q over the roots at −2 and 2. We also see that Q(0) = 0, and νP,P 0Q(1)− νP,P 0Q(1) = 0. That is, νP,P 0Q did not count the root c = 0 of P which was also a root of Q.

Following the above results, given polynomials P,Q ∈ R[X], we are able to count the number of distinct real roots c of P in an interval such that Q(c) > 0 as follows. For roots c of P such that Q(c) 6= 0, we have Q(c)2 > 0. If we then apply Theorem 1.1.3 with Q2 replacing Q, we have that, for a < b ∈ R not roots of P , νP,P 0Q2 (a) − νP,P 0Q2 (b) is equal to the number of distinct roots c of P in the interval (a, b) such that Q(c)2 > 0, which is equal to the number of distinct roots c of P in the interval (a, b) such that Q(c) > 0 plus the number of those such that Q(c) < 0. We define the following expression involving the Sturm sequence of P and Q, as well as the Sturm sequence of P and Q2:

NP,Q(a, b) := [νP,P 0Q(a) − νP,P 0Q(b)] + [νP,P 0Q2 (a) − νP,P 0Q2 (b)], where (νP,P 0Q(a) − νP,P 0Q(b)) is equal to the number of distinct roots c of P in the interval (a, b) such that Q(c) > 0 minus the number of those such that Q(c) < 0, and (νP,P 0Q2 (a)−νP,P 0Q2 (b)) is equal to the number of distinct roots c of P in the interval (a, b) such that Q(c) > 0 plus the number of those such that Q(c) < 0. That is, NP,Q(a, b) is equal to twice the number of distinct roots c of P in the interval (a, b) such that Q(c) > 0. We give a brief example to

20 illustrate this. Example 9. Let P = X(X − 1)(X + 1) = X3 − X and let Q = X. By observation, the number of real roots c of P such that Q(c) > 0 is 1.

P P 4 Q

2

X −3 −2 −1 1 2 3

−2

−4

Figure 9: Graph of P and Q.

The Sturm sequence of P and P 0Q is (X3 − X, 3X3 − X,X), and the Sturm sequence of P and P 0Q2 is (X3−X, 3X4−X2, −X3+X, −2X2, −X). Evaluating νP,P 0Q and νP,P 0Q2 between the roots of P , we find

νP,P 0Q(−2) = 0, νP,P 0Q2 (−2) = 3, 1 1 ν 0 (− ) = 1, ν 0 2 (− ) = 2, P,P Q 2 P,P Q 2 1 1 ν 0 ( ) = 1, ν 0 2 ( ) = 2, P,P Q 2 P,P Q 2

νP,P 0Q(2) = 0, νP,P 0Q2 (2) = 1. 1 1 Indeed, we find that 2 NP,Q(−2, 2) = 2 ((0 − 0) + (3 − 1)) = 1 as we desired. We are now able to compute the number of distinct real roots of a polynomial P in an interval, as well as the number of those such that another polynomial, say Q, is nonzero, or greater than zero. We apply this to a simple construction involving a polynomial P and its derivative P 0. 0 2 If we set Q = (P ) in Theorem 1.1.3, then νP,P 0Q counts the number of distinct real roots c of P such that Q(c) = (P 0(c))2 > 0 minus the number of those such that Q(c) = (P 0(c))2 < 0. For real roots c of P , Q(c) > 0 if and only

21 if c is a simple root of P , (otherwise Q(c) = (P 0(c))2 = 0). Hence by Theorem 1.1.3, for real numbers a < b not roots of P , νP,P 0Q(a) − νP,P 0Q(b) is equal to the number of simple roots of P in the interval (a, b), (remembering that νP,P 0Q does not counts roots of P which are also roots of Q). Now, for a polynomial P ∈ R[X] we define G0(P ) = P , and for k ≥ 1 we define recursively

0 Gk(P ) := gcd (Gk−1(P ),Gk−1(P ) ) .

Claim 1.1.4. The factors F of Gk(Q) of multiplicity d are precisely the factors of Q of multiplicity d + k, where k ≤ d.

Proof. If F is a factor of Gk−1(Q) of multiplicity d ≥ 1, then F is a factor 0 of Gk(Q) = gcd(Gk−1(Q),Gk−1(Q) ) of multiplicity d − 1, (the minimum of 0 the multiplicity of F in Gk−1(Q) and Gk−1(Q) ). If F is a factor of Gk(Q) of 0 multiplicity d − 1 ≥ 1, then F is a factor of both Gk−1(Q) and Gk−1(Q) of 0 multiplicity at least d−1. However, the multiplicity of F in Gk−1(Q) is exactly 1 less than the multiplicity of F in Gk−1(Q). Hence the multiplicity of F in Gk−1(Q) is d. If a factor F in Q has multiplicity d ≥ 1, then the multiplicity 0 of F in G1(Q) = gcd(Q, Q ) is d − 1. If a factor F in G1(Q) has multiplicity d − 1 ≥ 1, then F must be a factor of both Q and Q0 of multiplicity at least d − 1. Therefore F is a factor of Q of multiplicity d. This proves the claim.

2 With the above construction, set Q = (Gk(P )) . Then a real root c of P is also a root of Q if and only if c is a root of P of multiplicity k + 1 or greater. Therefore, for real numbers a < b not roots of P , νP,P 0Q(a) − νP,P 0Q(b) is equal to the number of distinct roots of P in the interval (a, b) with multiplicity k or lower, (that is, not roots of Gk(P )). Finally, replacing P with Gk(P ) in the construction of νP,P 0Q and setting 0 2 Q = (Gk(P ) ) , we arrive at the following result.

Theorem 1.1.5. Let P ∈ R[X], let k ∈ N, and let a < b be real numbers, not roots of P . Denote Gk := Gk(P ). Then the number of real roots of P in the interval (a, b) of multiplicity k + 1 is equal to

ν 0 0 2 (a) − ν 0 0 2 (b). Gk,Gk(Gk) Gk,Gk(Gk)

Proof. The proof is an immediate consequence of the definition of Gk(P ) and the application of Claim 1.1.4 and Theorem 1.1.3. To clarify this, Claim 1.1.4 asserts that the simple roots of Gk(P ) are precisely the roots of P of multiplicity 0 0 0 2 k + 1. Then, by replacing P , P , and Q with Gk(P ), Gk(P ) , and (Gk(P ) ) respectively in the construction of the Sturm sequence (and subsequently in the construction of νP,P 0Q), we can count the number of simple roots of Gk(P ) in 0 an interval of the real line, (since, for a root c of P , Gk(P )(c) 6= 0 if and only if c is a simple root of P ). That is, we can count the number of distinct roots of P of multiplicity k + 1 in that interval.

22 We illustrate this result in the following example. Example 10. Consider the polynomial S = X3(X − 1)2 = X5 − 2X4 + X3, which has a triple root c1 = 0, and a double root c2 = 1. Then we have 0 4 3 2 S = 5X − 8X + 3X . We now construct Sturm sequences of Gk(S) and 0 3 (Gk(S) ) for each k = 0, 1, 2, 3. 0 2 0 2 Setting P = G0(S) = S and Q = (G0(S) ) = (S ) , the Sturm sequence of P and P 0Q is (S, (S0)3, −S). Then we have

νP,P 0Q(−2) − νP,P 0Q(2) = 0.

As expected, 0 simple roots of S were counted. 0 3 2 2 We compute G1(S) = gcd(S, S ) = X − X = X (X − 1), and set P = 3 2 0 2 2 2 4 3 2 G1(S) = X −X and Q = (G1(S) ) = (3X −2X) = 9X −12X +4X . Then the Sturm sequence of P and P 0Q is (X3 −X2,X3(3X −2)3, −(X3 −X2), −X2), and we have νP,P 0Q(−2) − νP,P 0Q(2) = 1. That is, 1 double-root of S was counted. 0 3 2 2 We compute G2(S) = gcd(G1(S),G1(S) ) = gcd(X − X , 3X − 2X) = X, 0 0 2 and G2(S) = 1. Setting P = G2(S) = X, and Q = (G2(S) ) , we easily find that the Sturm sequence of P and P 0Q is (X, 1). Then we have

νP,P 0Q(−2) − νP,P 0Q(2) = 1.

That is, 1 triple-root of S was counted. So far the polynomials we have looked at in examples have been made to be simple and easy to visualize, so as to allow the reader to check computations graphically. This is, of course, not likely to be the case in general. In the above example we know that none of the polynomials may have roots outside of the interval (−2, 2), but such bounds may not be obvious for more complex examples. That is, one does not immediately know how large an interval we must consider in order to ensure that the roots of an arbitrary polynomial lie within this interval. We now introduce a bound on the absolute value of possible roots of an arbitrary polynomial, which we can apply to our results relating to computing with Sturm sequences.

d Lemma 1.1.6. Consider a polynomial P (X) = a0X +...+ad−1X +ad. Then

1   i ai |c| ≤ max d i=1,...,d a0 for all roots c of P .

1   i ai Proof. Consider a point z ∈ such that |z| > maxi=1,...,d d . Then we C a0 i d−i have |a0||z| /d > |ai| for all i = 1, . . . , d. Multiplying both sides by |z| we d d−i find that |a0||z| /d > |aiz | for all i = 1, . . . , d. Adding all d terms together,

23 d−1 d d d we have |a1||z| + ... + |ad| < d(|a0||z| /d) = |a0||z| = |a0z |, and hence, d−1 0 d by the triangle inequality, |a1z + ... + adz | < |a0z |. This shows that the d magnitude of the leading term a0z is strictly larger than the magnitude of d d−1 the remaining terms put together, meaning that a0z + a1z + ... + ad is necessarily nonzero. That is, z is not a root of P .

The above result shows that for any polynomials P,Q ∈ R[X], the functions νP,Q, νP,P 0 , νP,P 0Q, and νP,P 0Q2 are constant on some unbounded intervals (−∞,A) and (B, ∞), A, B ∈ R. Therefore, if we wish to count all possible roots of P in accordance to any of the previous results in this section, we know that there always exists an interval (which we can compute in terms of the coefficients of P ) large enough to contain all real roots of P . Equivalently, since the ν functions are constant outside of such an interval, it is enough to consider ν(−∞) − ν(∞). That is to say, in order to explicitly compute the total number of roots of a polynomial in accordance to any of the previous results in this section, it is enough to consider the degree and sign of the leading term for each entry in the Sturm sequences, since the highest-order terms determine the sign of a polynomial at ±∞. Explicitly, if the Sturm sequence in question is (P0,P1,...,Pk), then the value of the associated ν function at −∞ and ∞ is d0 d equal to the number of sign changes in (lc(P0)(−1) , . . . , lc(Pk)(−1) k ) and (lc(P0), . . . , lc(Pk)) respectively, where lc(P ) denotes the leading coefficient of the polynomial P , and dj denotes the degree of the polynomial Pj, for j = 0, . . . , k. We have considered systems of the form “P = 0 and Q > 0” for polyno- mials P,Q ∈ R[X], and we are able to count the total number of real roots of such a system, (as well as some other related quantities concerning the roots and multiplicities of roots of polynomials). If, instead of a single polynomial equation, we have the system P1 = ... = Ps = 0, where P1,...,Ps ∈ R[X], 2 2 then we can replace P in the usual construction with P1 + ... + Ps , noting that 2 2 P1 + ... + Ps = 0 if and only if Pi = 0 for all i = 1, . . . , s. Hence we can reduce a system of the form “P1 = 0 and ... and Ps = 0 and Q > 0” to the system 2 2 “P := P1 + ... + Ps = 0 and Q > 0”, once again leaving single polynomial equation and inequality. We now generalize to a system of the form

P = 0 and Q1 > 0 and ... and Qr > 0, where P,Q1,...,Qr ∈ R[X]. For convenience, we begin with the assumption that P is relatively prime to each Q1,...,Qr, so that none of the Qi can have r common roots with P . Denote  := (1, . . . , r) ∈ {0, 1} , and define the product  1 r Q := Q1 ...Qr . Denote by s the number of distinct real roots c of P such that Q(c) > 0 minus the number of those such that Q(c) < 0. Since Q ∈ R[X] r for any  ∈ {0, 1} , we can compute s by simply applying Theorem 1.1.3. Explicitly, by setting Q = Q, we have

s = νP,P 0Q(−∞) − νP,P 0Q(∞).

24 Furthermore, recalling that

NP,Q := [νP,P 0Q (−∞) − νP,P 0Q (∞)] + [νP,P 0Q2 (−∞) − νP,P 0Q2 (∞)] is equal to twice the number of distinct real roots c of P such that Q(c) > 0, we can easily compute the number of distinct real rootsc of P such that Q(c) > 0. r Denote ψ := {ψ1, . . . , ψr} ∈ {0, 1} , and denote by cψ the number of distinct ψi real roots c of P such that sign(Qi(c)) = (−1) for all i ∈ {1, . . . , r}. That is, cψ denotes the number of distinct real roots of P at which each Qi has a sign designated by ψi. For example, if ψ = (0,..., 0) the r-tuple whose entries are all 0, then cψ is the number of distinct real roots of P at which all Qi are positive. Note that while we can compute each s relatively easily, we cannot yet compute the cψ, which is what we desire. Denote by es and ec the vectors r of length 2 whose entries are the s and cψ respectively. We can compute r r the vector ec via es and an invertible 2 × 2 matrix which is independent of the polynomials P and Q1,...,Qr, and depends only on r.

r r Claim 1.1.7. There exists a 2 × 2 invertible matrix, Ar, depending only on r, such that es = Ar · ec. Proof. We prove using induction on r, the number of polynomial inequalities. The case when r = 0 holds trivially, since s = s∅ = c∅ = cψ, where s∅ and c∅ are both equal to the number of distinct real roots of P . Consider the case when r = 1. We have

s0 = νP,P 0Q0 (−∞) − νP,P 0Q0 (∞), which is equal to the number of distinct real roots c of P (those at which Q(c) > 0 plus those at which Q(c) < 0, since Q0(c) is always greater than 0). That is, s0 = c0 + c1. We also have s1 = νP,P 0Q1 (−∞) − νP,P 0Q1 (∞), which is equal to the number of distinct roots c of P such that Q(c) > 0 minus the number of those such that Q(c) < 0. That is,

s1 = c0 − c1.

Therefore when r = 1 we have the 2 × 2 matrix:       s0 1 1 c0 s = = · = A1 · c. e s1 1 −1 c1 e

Clearly A1 is invertible. We have proven the claim for r = 1. We now assume that the claim is true for the case of r ≥ 1. That is, we r r assume that there is an invertible 2 × 2 matrix Ar such that es = Ar · ec. If we  1 r have polynomials Q1,...,Qr,Qr+1 write Q = Q1 ...Qr for  = (1, . . . , r) ∈

25 r ,r+1  r+1 {0, 1} , and denote by Q the product Q Qr+1 for r+1 ∈ {0, 1}. We define ,r+1 s,r+1 to be the number of distinct real roots c of P such that Q (c) > 0 ,r+1 minus the number of those such that Q (c) < 0, and we define cψ,ψr+1 to ψi be the number of distinct real roots c of P such that sign(Qi(c)) = (−1) for r all i = 1, . . . , r + 1, where ψ = (ψ1, . . . , ψr) ∈ {0, 1} , and ψr+1 ∈ {0, 1}. Now, if we let r+1 = 0, then  .  .    0 s(,0) = the number of distinct real roots c of P such that Q Qr+1(c) > 0  .  minus the number of those such that QQ0 (c) < 0. . r+1 = the number of distinct real roots c of P such that  Q (c) > 0 and Qr+1(c) > 0  plus the number of those such that Q (c) > 0 and Qr+1(c) < 0  minus the number of those such that Q (c) < 0 and Qr+1(c) > 0  minus the number of those such that Q (c) < 0 and Qr+1(c) < 0.

= Ar · cψ,0 + Ar · cψ,1 where the first and third terms constitute Ar · cψ,0, and the second and fourth terms constitute Ar ·cψ,1 in the final equality. Similarly, if we let r+1 = 1, then  .  .    1 s(,1) = the number of distinct real roots c of P such that Q Qr+1(c) > 0  .  minus the number of those such that QQ1 (c) < 0. . r+1 = the number of distinct real roots c of P such that  Q (c) > 0 and Qr+1(c) > 0  plus the number of those such that Q (c) < 0 and Qr+1(c) < 0  minus the number of those such that Q (c) > 0 and Qr+1(c) < 0  minus the number of those such that Q (c) < 0 and Qr+1(c) > 0.

= Ar · cψ,0 − Ar · cψ,1 where the first and fourth terms constitute Ar · cψ,0, and the second and third terms constitute Ar · cψ,1. Combining these results, we have  .   .  . .     s(,0) cψ,0   A A     .  r r  .  es =  .  = ·  .  = Ar+1 · ec.   Ar −Ar   s(,1) cψ,1  .   .  . .

Finally, we need to check that Ar+1 is invertible. By the inductive assumption Ar is invertible, and so we can easily verify that  −1 −1    1 Ar Ar Ar Ar −1 −1 = 1. 2 Ar −Ar Ar −Ar

26 We are now able to find the values cψ. In particular, we can compute the total number of real roots c of P such that Q1(c) > 0 and ... and Qr(c) > 0, for polynomials P,Q1,...,Qr ∈ R[X]. We illustrate this with an example involving 3 inequalities.

Example 11. Let P = (X + 2)(X − 1)(X − 4), Q1 = (X + 3)(X − 2)(X − 3), Q2 = X − 3, and Q3 = X, and consider the system

P = 0 and Q1 > 0 and Q2 > 0 and Q3 > 0.

The computation of the Sturm sequences is straightforward and will be omitted - the evaluate of each s and cψ will be done by observation.

25 P P Q1 20 Q2 Q3 15

10

5

X −4 −3 −2 −1 1 2 3 4 −5

−10

Figure 10: Graph of P , Q1, Q2, Q3.

Noting that the sum of the entries of ec must be exactly 3, we easily compute the following from the above graph:

s(0,0,0) = 3, s(0,0,1) = 1, s(0,1,0) = −1, s(0,1,1) = 1,

s(1,0,0) = 3, s(1,0,1) = 1, s(1,1,0) = −1, s(1,1,1) = 1,

c(0,0,0) = 1, c(0,1,0) = 1, (0,1,1) = 1. That is,

es = (3, 1, −1, 1, 3, 1, −1, 1) and ec = (1, 0, 1, 1, 0, 0, 0, 0).

27 To clarify what is being calculated, s(0,0,0), for example, is the number of distinct (0,0,0) 0 0 0 real roots of P at which Q = Q1Q2Q3 is greater than zero, minus the number of those at which Q(0,0,0) is less than zero. Of course, Q(0,0,0) = 1 everywhere, so we find that s(0,0,0) is simply equal to the total number of distinct real roots of P (remembering that we are only considering Qi relatively prime to P so far). That is, s(0,0,0) = 3. For another example, consider s(0,1,0), which (0,1,0) 0 1 0 is equal to the number of distinct real roots of P at which Q = Q1Q2Q3 is positive, minus the number of those at which Q(0,1,0) is negative. The sign (0,1,0) of Q is essentially just the sign of Q2, and we observe that Q2 is negative at roots −2 and 1 of P , but is positive at the root 4 of P . That is, s(0,1,0) = 1 − 2 = −1. We can now easily verify the statement of Claim 1.1.7 for this example with

1 1 1 1 1 1 1 1  1  3  1 −1 1 −1 1 −1 1 −1 0  1        1 1 −1 −1 1 1 −1 −1 1 −1       1 −1 −1 1 1 −1 −1 1  1  1  A3 · c =   ·   =   = s. e 1 1 1 1 −1 −1 −1 −1 0  3  e       1 −1 1 −1 −1 1 −1 1  0  1        1 1 −1 −1 −1 −1 1 1  0 −1 1 −1 −1 1 −1 1 1 −1 0 1

We must finally consider the case when the P,Q1,...,Qr ∈ R[X] are ar- bitrary, so that P and the Qi may not be relatively prime. To take care of  common roots between P and some Qi, we redefine Q in the following way. Take Qr Q2  i=1 i 2−1 2−r Q := = Q1 ...Qr , Q1 ...Qr r where  = {1, . . . , r} ∈ {0, 1} as usual. This means that, for roots c of P  such that some Qi(c) = 0, we are guaranteed to still have Q (c) = 0 even if i i = 0. (In the previous construction, if i = 0 then Qi is taken to be equal to i 1 everywhere, so that Qi (c) = 1, and c may be counted as a root of P at which Q 6= 0.) The point of this construction is to ensure that roots of P which are also roots of some Qi are not counted by νP,P 0Q . To see this, we simply invoke  Theorem 1.1.3 again, with Q in place of Q. Then νP,P 0Q (−∞) − νP,P 0Q (∞) is equal to the number of distinct real roots c of P such that Q(c) > 0 minus the number of those such that Q(c) < 0, and ignoring those such that some Qi(c) = 0. Thus with Claim 1.1.7 and Theorem 1.1.3, (and replacing multiple 2 2 equations P1 = ... = Ps = 0 with P1 + ... + Ps = P = 0 if necessary), we are able to compute the total number of real solutions to a system of the form P = 0 and Q1 > 0 and ... and Qr > 0 for P ∈ R[X] non-constant, and arbitrary Q1,...,Qs ∈ R[X]. So far we have only considered equations and strict inequalities. We are able to replace the system “Q ≥ 0” with the disjunction “Q > 0 or Q = 0”, giving two systems which can be solved separately using the above method.

28 If we have a system consisting only of inequalities, say “Q1 > 0,...,Qr > 0”, we can break it down into two types of solutions. This system is satisfied on an unbounded interval of the form (a, ∞) (or (−∞, a) respectively) if and only if the leading coefficients of Q1(X),...,Qr(X), (or Q1(−X),...,Qr(−X) respectively), are all positive. That is, for polynomials Qi of even degree, the leading coefficient is positive, and for polynomials Qj of odd degree, the leading coefficient is positive in the case of an interval of the form (a, ∞), and negative in the case of an interval of the form (−∞, a). The system “Q1 > 0,...,Qr > 0” is satisfied on some bounded interval, say (a, b), where a, b are real roots of the r Y 0 product Q := Qi, if and only if the system “Q = 0,Q1 > 0,...,Qr > 0” has i=1 a real solution. To see why, assume that all Qi are positive on some bounded interval (a, b), where a and b are real roots of Q. Since a and b are roots of r Y 0 the product Q := Qi, Rolle’s theorem asserts that Q must have a root in i=1 the interval (a, b). Conversely, if the system “Q1 > 0,...,Qr > 0” has no 0 solution in the interval (a, b), then “Q = 0,Q1 > 0,...,Qr > 0” is also not satisfied on the interval. (We need only consider consecutive roots of Q, since if there are real roots of Q inside of the interval (a, b), then at least one of the Qi necessarily changes sign, even if only at the root itself. In addition, with a and b consecutive roots of Q, if the system is satisfied at any point in (a, b), then it is clearly satisfied on the entire interval (a, b).)

29 1.2 Tarski-Seidenberg - Elimination of a variable In this subsection we discuss an algorithm for eliminating a variable when de- termining whether or not a system of polynomial equations and inequalities has real solutions. We consider a system of polynomial equations and inequalities in n + 1 variables, (Y1,...,Yn) = Y , and X, with real coefficients:   P1(Y1,...,Yn) B1 0  . S(Y1,...,Yn,X) := .   Pr(Y1,...,Yn) Br 0. Denote by lc(P ) the leading coefficient of a polynomial P . A system with “fixed degrees” is a system (S(Y,X), D(Y )) where D(Y ) forces (either implicitly or explicitly) the leading coefficients lc(Pi) to be nonzero for all Pi considered in S(Y,X). To see why a system with fixed degrees is desirable, consider the polynomial P (A, B, X) = AX + B. The system AX + B = 0 has a solution if and only if either statement “A 6= 0” or “A = 0 and B = 0” is satisfied. Indeed, if A is nonzero then P has a root at −B/A, and the first statement is satisfied. Conversely, if A = 0 and B 6= 0 then P is a nonzero constant which has no roots, and we observe that neither of the two statements are satisfied. We will see shortly that each of these cases have a corresponding Sturm sequence, and that for systems with unfixed degrees, these associated Sturm sequences depart from one another precisely when a leading coefficient vanishes in one and not in the other. It is obvious that any system of polynomial equations and inequalities (with or without fixed degrees) is equivalent to a finite disjunction of systems with fixed degrees. It is therefore sufficient to consider only systems with fixed degrees. Theorem 1.2.1 (Tarski-Seidenberg - First Form). There exists an algorithm which, given such a system S(Y,X), produces a finite list of systems of poly- nomial equations and inequalities in Y = (Y1,...,Yn) with real coefficients, say n C1(Y ),..., Cq(Y ), such that, for every point y = (y1, . . . , yn) ∈ R , the system S(Y,X) has a real solution if and only if one of the systems Ci(y) is satisfied. More succinctly, the statement “there exists a real solution, X, such that S(Y,X)” is equivalent to the statement “C1(Y ) or ... or Cq(Y )”, which de- pends only on Y . That is, there is an algorithm for eliminating the real variable, X. Before proving the statement of Theorem 1.2.1 we explore some preliminary results followed by some examples to elucidate the details of each step of the algorithm, as well as to clarify the overall structure. We introduce the sign function, sign : R → {−1, 0, 1}, defined by   {+1} if x > 0 sign(x) := {0} if x = 0  {−1} if x < 0. Recall that we only need to consider systems with equalities “=” and inequalities “>”, since other relations can be expressed in terms of these, and recall that

30 multiple equalities P1 = ... = Ps = 0 can be expressed as a single equality, P = 2 2 P1 +...+Ps = 0. Combining this with the fact that any system can be expressed a finite disjunction of systems with fixed degrees, we only need to consider systems with fixed degrees containing a single polynomial equality P = 0 and polynomial inequalities Q1 > 0,...,Qr > 0. For polynomials P,Q1,...,Qr in n variables (Y1,...,Yn,X), and for a point y = (y1, . . . , yn) ∈ R , let D(y) denote the statement “lc(P (y)) 6= 0 and lc(Q1(y)) 6= 0 and ... and lc(Qr(y)) 6= 0”. Lemma 1.2.2. There exists an algorithm which, given a family of real polyno- mials (P,Q1,...,Qr) in variables (Y1,...,Yn,X) of positive degree with respect to X, produces a finite list of polynomials (R1,...,Rl) in (Y1,...,Yn), and a l function c : {−1, 0, 1} → N such that, for every l-tuple of signs  = (1, . . . , l) ∈ {−1, 0, 1}l and every point y ∈ Rn satisfying the statement 00 “D(y) and sign(R1(y)) = 1 and ... and sign(Rl(y)) = l , the system

“P (y, X) = 0 and Q1(y, X) > 0 and ... and Qr(y, X) > 0” has exactly c() solutions. Proof. First, we compute the Sturm sequences as in Claim 1.1.7. That is, for r each δ = (δ1, . . . , δr) ∈ {0, 1} , we compute the Sturm sequence of P and δ δ1 δr Q = Q1 ...Qr as we aim to compute ec, the vector whose entries are cψ (the number of real roots of P such that sign(Qi) = ψi for all i = 1, . . . , r). For every polynomial in a Sturm sequence, we test whether its leading coefficient is zero. In the case that the leading coefficient of a polynomial is zero, we re- place the polynomial with its truncation (deleting the leading term from the polynomial). We assume that the leading coefficients of P,Q1,...,Qr are all nonzero, ensuring that all Sturm sequences begin with non-constant polynomi- als (as required by a Sturm sequence). This yields a tree of Sturm sequence computations, where the branching tests are polynomial equations/inequations (“= 0” or “6= 0”) in the parameters Y = (Y1,...,Yn). Thus every branch corre- sponds to a system of equations and inequations in Y , and we obtain the Sturm sequence corresponding to all parameters y ∈ Rn satisfying the system. From Lemma 1.1.6 we know that the signs of the leading coefficients of the polynomials in each Sturm sequence determine the value of ν(−∞) − ν(+∞). In particu- lar, these leading coefficients are rational fractions (as a result of performing Euclidean division in computing the Sturm sequences), say A(Y )/B(Y ), where B(Y ) is assumed to be nonzero within the branch in which it occurs. Notice that A(Y )B(Y ) has the same sign as A(Y )/B(Y ). We define the R1,...,Rl to be these polynomials A(Y )B(Y ) for all leading coefficients A(Y )/B(Y ) of polynomials from all branches of all trees of Sturm sequence computations. Finally, assuming that D(y) is satisfied and fixing the sign of each of the R1(y),...,Rl(y), applying the results of Lemma 1.1.6 and Claim 1.1.7 to the R1,...,Rl, we are able to compute the number of real solutions of the sys- tem “P (y, X) = 0 and Q1(y, X) > 0 and ... and Qr(y, X) > 0”. That is, in

31 terms of Claim 1.1.7, we use the signs of the leading coefficients to compute the vector es whose entries are sδ = νP,P 0Qδ (−∞) − νP,P 0Qδ (+∞), which we can use to compute c. The first entry of c is c = c , which precisely e e (ψ1,...,ψr ) (0,...,0) corresponds to the number of solutions to the system, as required. We give a detailed examination of the algorithm in the following example. Example 12. Let P = X2 + aX + b and Q = X + c, and consider the system “P = 0 and Q > 0”. We begin by computing Sturm sequences. Since there is only a single polynomial inequality, our r-tuple δ = (δ1, . . . , δr) is just δ = δ1. We are considering P and Q in full generality, and so we cannot assume that they are relatively prime - they may have common roots depending on (a, b, c). Particularly, we will use Claim 1.1.7, allowing for the possibility that P and Q have common roots. We define

Qr Q2 Q2 Qδ = i=1 i = = Q2−δ1 . Qδ1 ...Qδr Qδ1

We wish to compute the vector s whose entries s are the values of e (δ1,...,δr )

sδ = νP,P 0Qδ (−∞) − νP,P 0Qδ (∞)

r for each δ ∈ {0, 1} , and from this we can compute the vector ec, whose first entry is precisely the value we wish to obtain. Again, since r = 1, we wish to compute s0 := νP,P 0Q2 (−∞) − νP,P 0Q2 (∞), corresponding to δ1 = 0, and

s1 := νP,P 0Q(−∞) − νP,P 0Q(∞), corresponding to when δ1 = 1. We are therefore required to compute the Sturm sequence of P and P 0Q, and the Sturm sequence of P and P 0Q2. Note that since there is only 1 inequality in this example, the use of Claim 1.1.7 is not entirely necessary as we can use the construction following Theorem 1.1.3 to calculate the desired values. Explicitly, the number of real roots of P at which Qδ > 0 is equal to 1 1 N δ = [ν 0 δ (−∞) − ν 0 δ (∞)] + [ν 0 δ 2 (−∞) − ν 0 δ 2 (∞)]. 2 P,Q 2 P,P Q P,P Q P,P (Q ) P,P (Q ) As a novel exercise, we will also compute the Sturm sequence of P and P 0Q4 δ1 in order to compute NP,Qδ . We first consider the case of δ1 = 1, taking Q . 0 2 a ac Then P1 = P Q = 2X + (2c + a)X + ac, and P2 = (c − 2 )X + 2 − b =: AX + B. The first branching test is whether or not A = 0. If A 6= 0, then 2 2 2 A P3 = (2c + a)AB − acA − 2B =: Σ.e The next branching test within the current branch is whether or not Σe = 0. If Σe 6= 0 then the associated Sturm sequence is 0 (P0,P1,P2,P3) = (P,P Q, AX + B, Σ)e .

32 If Σe = 0 then the associated Sturm sequence is simply (P,P 0Q, AX + B). If ac A = 0 we begin down a different branch, in which P2 = 2 −b = B, and the next branching test is whether or not B = 0. If B 6= 0 then the associated Sturm sequence is (P,P 0Q, B), and if B = 0 then the associated Sturm sequence is simply (P,P 0Q). The computation tree is illustrated in Figure 11, where we have a ac A = c − ,B = − b 2 2 Σe = (2c + a)AB − acA2 − 2B2

The leading coefficients 1, 2, A, Σe (as polynomials in a, b, c) appearing in the left side of the computation tree, and B in the right side, are exactly the polynomials which we define the Rk to be (noting that the leading coefficients appearing as rational polynomials have already been simplified).

2 P0 = X + aX + b

2 P1 = 2X + (2c + a)X + ac

A 6= 0 A = 0

P2 = B

P2 = AX + B B 6= 0 B = 0

  sign(1) = + + sign(1) = +  sign(2) = + + sign(2) = + Σe 6= 0 Σe = 0     sign(B) = + − sδ = 0 sδ = 0 0 sign(1) = + +  sign(2) = + +  2   A P3 = Σe sign(A) = + −  sδ = 1 −1 sign(1) = + + + + sign(2) = + + + +   sign(A) = + + − −   sign(Σ)e = + − + − sδ = 2 0 −2 0

Figure 11: Sturm sequence computation tree for P and P 0Q.

δ1 Now consider the case when δ1 = 0, and take Q , (which is equivalent to δ1 2 3 2 2 2 (Q ) with δ1 = 1). Then P1 = 2X + (4c + a)X + (2c + 2ac)X + ac ,

33 2 P2 = −P = −X − aX − b, and P3 = ∆X + Γ. So far, the leading coefficients 1, 2, −1, and ∆ are included among the functions R1,...,Rl. (Of course, Γ and Σ are also among the R1,...,Rk, as they also appear as leading coefficients in the computation tree.) The first branching test is whether ∆ = 0. The rest of the tree is computed as in the previous case. The full computation tree for this case is illustrated below, where

∆ = 2ac + 2b − a2 − 2c2 Γ = 4bc − ab − ac2 Σ = b∆2 + a∆Γ + Γ2

2 P0 = X + aX + b

3 2 2 2 P1 = 2X + (4c + a)X + (2c + 2ac)X + ac

2 P2 = −P = −X − aX − b

∆ 6= 0 ∆ = 0

P3 = Γ P3 = ∆X + Γ Γ 6= 0 Γ = 0

 sign(1) = + +  sign(1) = +  sign(2) = + +  sign(2) = + Σ 6= 0     Σ = 0 sign(−1) = − − sign(−1) = −    sign(∆) = + − sδ = 0 sδ = 0 0  sign(1) = + + 2 ∆ P4 = Σ  sign(2) = + +   sign(−1) = − −    sign(∆) = + − sδ = −1 1

 sign(1) = + + + +  sign(2) = + + + +   sign(−1) = − − − −    sign(∆) = + + − −    sign(Σ) = + − + − sδ = 0 −2 0 2

Figure 12: Sturm sequence computation tree for P and P 0Q2.

We now have the Sturm sequence computation tree for every case that we need in order to apply Claim 1.1.7. However, before we do so, we consider the case

34 δ1 2 when δ1 = 0 and we take (Q ) for the sake of computing NP,Qδ . (Note that δ1 2 δ1 taking (Q ) with δ1 = 1 is equivalent to taking Q with δ1 = 0, and that the associated Sturm sequences are equivalent.) Again, we compute each new polynomial in the Sturm sequence as usual, branching whenever a non-constant (with respect to (a, b, c)) leading coefficient appears. Then we have

∆0 = 12ac2 + 12bc3 + a2b + a3 − 2c4 − 4ac3 − 4abc − 2b2 − 4a2c − 2ab Γ0 = 8bc3 + ab2 + 4a2bc − ac4 − 6abc2 − a2b Σ0 = (Γ0)2 − a∆0Γ0 + b(∆0)2

Note that the structure of this computation tree is similar to the previous tree, with the only differences being in the coefficients, and the polynomial P1.

2 P0 = X + aX + b

5 4 2 3 3 2 2 4 3 4 P1 = 2X + (8c + a)X + (12c + 4ac)X + (8c + 6ac )X + (2c + 4ac )X + ac

2 P2 = −P = −X − aX − b

∆0 6= 0 ∆0 = 0

0 0 0 P3 = Γ P3 = ∆ X + Γ Γ0 6= 0 Γ0 = 0

 sign(1) = + +  sign(1) = +  sign(2) = + +  sign(2) = + 0 0     Σ 6= 0 Σ = 0 sign(−1) = − − sign(−1) = −  0  sign(∆ ) = + − sδ = 0 sδ = 0 0

 sign(1) = + + 0 2 0 (∆ ) P4 = Σ  sign(2) = + +   sign(−1) = − −  0  sign(∆ ) = + − sδ = −1 1  sign(1) = + + + +  sign(2) = + + + +   sign(−1) = − − − −  0  sign(∆ ) = + + − −  0   sign(Σ ) = + − + − sδ = 0 −2 0 2

Figure 13: Sturm sequence computation tree for P and P 0Q4.

35 Continuing with the application of Claim 1.1.7, we now have       s0 1 1 c0 s = = · = A1 · c, e s1 1 −1 c e where s0 and s1 are now known. So, we have         −1 s0 1 1 1 s0 c0 A1 · = · = . s1 2 1 −1 s1 c1

1 Therefore, c0 = 2 (s0 + s1), which is the number of real roots of P at which sign(Q) = (−1)0 = 1. Observe that this calculation is precisely the defini- 1 tion of 2 NP,Qδ . Note that in each tree, the value of sδ in each branch is calculated using only the signs of these leading coefficients. That is, the num- ber of real solutions of the system “P (a, b, c, X) = 0 and Q1(a, b, c, X) > 0 and ... and Qr(a, b, c, X) > 0” depends only on the signs 1, . . . , l of R1(a, b, c),...,Rl(a, b, c), as stated in Lemma 1.2.2. We conclude this example with a demonstration using a specific point in the space of coefficients. Let (a, b, c) = (0, −1, 0), so that P = X2 − 1 and Q = X. By observation, P has one root at x = 1 at which Q > 0, and one root at which Q < 0. The following data are easily computed:

A = 0,B = 1, Σe = −2, ∆ = −2, Γ = 0, Σ = −4, ∆0 = −2, Γ0 = 0 Σ0 = −4.

Following the appropriate path down the corresponding computation tree one finds that s0 = 2, and s1 = 0. Therefore we have           −1 s0 1 1 1 2 1 c0 A1 · = · = . = , s1 2 1 −1 0 1 c1 with c0, the number of roots of P at which sign(Q) = 1 = 1, is equal to 1, as expected. To complete the proof of Theorem 1.2.1, we must consider more general systems. As we saw earlier, we can replace multiple equations “P1 = ... = 2 2 Pk = 0” with a single equation, P = P1 + ... + Pk = 0. We also saw at the end of Subsection 1.1 that we can determine whether a system with no equations “Q1 > 0,...,Qr > 0” has real solutions or not. Particularly, we can determine if the system is satisfied on some unbounded interval by looking at just the leading coefficients, and we can determine if the system is satisfied on a closed Qr interval, whose endpoints are roots of the product Q := i=1 Qi, by examining 0 the system “Q = 0 and Q1 > 0 and ... and Qr > 0”. This concludes the proof.

36 1.3 Hermite’s method In this subsection we propose another method of counting the real roots of poly- nomials. Similar to Sturm’s method, Hermite’s root counting method is a way to count the number of distinct real roots of a polynomial in one variable with real coefficients, which is done using the properties of a related matrix whose entries are Newton sums. This method is heavily reliant on algebraic properties, (as opposed to the more geometric nature of Sturm’s method), although both methods are in fact connected via subresultant polynomials (see [2]). We use results from this subsection in the construction of a cylindrical algebraic decom- position in Section 3. The basic definitions required to give the main results of this subsection will be provided here, while proofs, illustrative examples, and other details required for a more complete picture will be diverted to Appendix A. d For a polynomial P = a0X + ... + ad ∈ R[X] with roots α1, . . . , αd ∈ C, we define

d X k Nk := (αi) , i=0 the k-th power Newton sums of the roots of P . The Nk can be computed using only the coefficients and degree of P , as detailed in Appendix A. Let U := (U1,...,Ud), and Q a quadratic form in U. The quadratic form Q is said to have matrix M if Q(U) = U >MU. If Q(U) has real coefficients and can be decomposed as

p p+q X 2 X 2 Q(U) = (Lj(U)) − (Lj(U)) j=1 j=p+1 where the Lj are linear forms, and L1,...,Lp+q is a linearly independent col- lection, then the signature of Q is p − q, and the rank of Q is p + q. Consider the matrix   N0 N1 ...Nd−1  N1 N2 ...Nd  H(P ) :=    . . .. .   . . . .  Nd−1 Nd ...N2d−2 whose entries are the Newton sums of the roots of P , which is both real and symmetric, and consider the quadratic form Q whose matrix is H(P ). Claim 1.3.1. The number of distinct real roots of P is equal to the signature of Q, and the number of distinct complex roots of P is equal to the rank of Q. Proposition 1.3.2. The rank of H(P ) as a matrix is equal to the rank of Q as a quadratic form.

37 Denote by δi(P ) the i-th principal minor of the matrix H(P ). That is   N0 ...Ni−1  . .. .  δi(P ) = det  . . .  , Ni−1 ...N2i−2 the determinant of the matrix formed from the first i rows and columns of H(P ). Proposition 1.3.3. The rank of H(P ) is equal to some r < d if and only if δd = ... = δr+1 = 0 and δr 6= 0. We use the following definition regarding the sequence of principal minors δi of H(P ) when they are allowed to vanish.

sign(g δi) := sign(δi) if δi 6= 0 k(k−1)/2 sign(g δi) := − 1 sign(δi−k) if δi = ... = δi−k+1 = 0 and δi−k 6= 0.

Theorem 1.3.4. Let P ∈ R[X] be a polynomial of degree d, and the principal minors δ1, . . . , δd of the matrix H(P ) be such that δr 6= 0 and δr+1, . . . , δd = 0. Then the number of distinct real roots of P is equal to r minus twice the number of changes in the sequence 1, sign(g δ1),..., sign(g δr). For polynomials P,Q ∈ R[X], the resultant of P and Q is the determinant of the Sylvester matrix of P and Q. The principal subresultant coefficient of order i of P and Q, written PSRCi(P,Q), is the determinant of the size d + e − 2i square matrix formed by deleting the first and last i rows and columns from the Sylvester matrix of P and Q, which is defined for all 0 ≤ i < min(d, e).

Theorem 1.3.5. Let P,Q ∈ R[X] and k < min(d, e) a non-negative inte- ger. The greatest common divisor of P and Q has degree > k if and only if PSRC0(P,Q) = ... = PSRCk(P,Q) = 0. d Lemma 1.3.6. Let P = a0X + ... + ad ∈ R[X] and 0 ≤ k < d an integer. 0 2k−1 Then PSRCd−k(P,P ) = a0 δk. We conclude this section with the following theorem.

Theorem 1.3.7. For a polynomial P ∈ R[X], the following statements are equivalent. ˆ P has r distinct complex roots. ˆ The matrix H(P ) has rank r.

ˆ The principal minors δd = ... = δd−(r−1) = 0 and δr 6= 0. ˆ The principal subresultant coefficients 0 0 0 PSRC0(P,P ) = ... = PSRCd−r−1(P,P ) = 0 and PSRCd−r(P,P ) 6= 0. Proof. This follows immediately from Claim 1.3.1, Proposition 1.3.2, Proposi- tion 1.3.3, and Lemma 1.3.6.

38 2 Semialgebraic sets

In this section we introduce the notion of semialgebraic subsets of Rn, and develop several important results pertaining to them. We show that the class of semialgebraic sets is stable under taking projections, (as another form of the Tarski-Seidenberg Theorem 1.2.1), which is a powerful tool in the study of semialgebraic sets. In contrast, the projection of an algebraic set is not generally algebraic. We offer a third formulation of the Tarski-Seidenberg principle - via first-order formulae - which can provide utility when we are required to define and manipulate semialgebraic sets, and to show their semialgebraicity. In the final subsection we define semialgebraic functions, motivated by our foundation of semialgebraic sets. We give numerous examples of types of semialgebraic functions, and we show various desirable properties of semialgebraic functions, including a bound in the form of the Lojasiewicz inequality, Theorem 2.4.8, some of which will be used in the following section.

2.1 Semialgebraic sets - Definitions and examples n Let us denote by SAn the class of semialgebraic subsets of R . A semialgebraic of Rn is a set defined as the collections of points satisfying finitely many boolean combinations of polynomial equations and inequalities in n variables. n The class SAn is the smallest class of subsets of R such that: for any P ∈ n n R[X1,...,Xn], the sets {x ∈ R | P (x) = 0} and {x ∈ R | P (x) > 0} are in n SAn, and for any A, B ∈ SAn, the sets A ∪ B, A ∩ B, and R \A are in SAn. All algebraic sets are also semialgebraic sets. As an example, the sphere

n−1 n 2 2 S = {(x1, . . . , xn) ∈ R | x1 + ... + xn = 1} is an algebraic subset of Rn, while the ball

n 2 2 B = {(x1, . . . , xn) ∈ R | x1 + ... + xn ≤ 1} is semialgebraic but is not algebraic. It is important to note that the semial- gebraicity of a set does not depend on the choice of affine coordinates. To see this, consider basic semialgebraic sets

n n S1 := {x ∈ R | P (x) = 0} and S2 := {x ∈ R | Q(x) > 0},

n where P,Q ∈ R[X1,...,Xn], and an affine transformation A of R defined by x = (x1, . . . , xn) 7→ (a1,1x1 + ... + a1,nxn + b1, . . . , an,1x1 + ... + an,nxn + bn).

It is clear that P (A(X)) and Q(A(X)) are also polynomials in X1,...,Xn, and hence the sets

0 n 0 n S1 := {x ∈ R | P (A(x)) = 0} and S2 := {x ∈ R | Q(A(x)) > 0} are also semialgebraic.

39 Let B1, B2 be boolean combinations of polynomial equations and inequalities. n Then a set of the form A := {x ∈ R | B1(x) and B2(x)} can be rewritten as n n {x ∈ R | B1(x)} ∩ {x ∈ R | B1(x)}. Similarly, a set of the form B := {x ∈ n n n R | B1(x) or B2(x)} can be rewritten as {x ∈ R | B1(x)} ∪ {x ∈ R | B1(x)}. n n Therefore, if the sets S1 := {x ∈ R | B1(x)} and S2 := {x ∈ R | B2(x)} are semialgebraic, then the sets A = S1 ∩ S2 and B = S1 ∪ S2 are also semialgebraic by definition. With this in mind, one can rewrite “P ≥ 0” as “P > 0 or P = 0”, and “P 6= 0” as “P > 0 or P < 0”. We can also replace “P < 0” with “−P > 0”, and a collection of equations P1 = ... = Ps = 0 is equivalent to the 2 2 single equation, P := P1 +...+Ps = 0. Thus one can rewrite any system of the form B = “P1 B1 0 and ... and Pr Br 0”, where the Bi are among >, <, =, ≥, 6=, as a finite disjunction of polynomial equations and strict inequalities. That is, any set of the form {x ∈ Rn | B(x)} can be written as a finite union of sets of the form P = 0 and Q1 > 0 and ... and Qr > 0, which are the intersection of basic semialgebraic sets defined by a single poly- nomial equation or inequality. Therefore, any set of the form {x ∈ Rn | B(x)} is semialgebraic. For example, the system B = “P ≥ 0 and Q 6= 0” can be rewrit- ten as B = “(P > 0 and Q 6= 0) or (P = 0 and Q 6= 0)”, which can in turn be rewritten as “(P = 0 and Q > 0) or (P > 0 and Q > 0) or (P = 0 and −Q > 0) or (P > 0 and − Q > 0)”. In a similar fashion, it is easy to see the following result.

Proposition 2.1.1. Every semialgebraic subset of Rn can be written as a finite union of semialgebraic sets of the form

n S = {x ∈ R | P (x) = 0 and Q1(x) > 0 and ... and Qr(x) > 0}. Proof. The proof is immediate from the above discussion. Some examples of semialgebraic sets are included below.

Example 13. A semialgebraic subset of R is a finite union of points and open intervals (possibly including half-lines of the form (−∞, a) and (b, ∞)). To see this, we can start by using the fact that a semialgebraic subset A ⊂ R is a finite union of sets of the form {x ∈ R | P (x) = 0,Q1(x) > 0,...,Qr(x) > 0}, where P,Q1,...,Qr ∈ R[X1,...,Xn]. Each of these sets are the intersection of the set of points x for which P (x) = 0 (either finitely many points, or all of R), with the intersection of the set {x ∈ R | Qi(x) > 0} for each i = 1, . . . , r, (a collection of open intervals, possibly including half-lines). Thus each set A is a finite union of points and open intervals, and every semialgebraic set is a finite union of such sets.

Example 14. Let f : Rm → Rn be a polynomial mapping with f = n (f1, . . . , fn), with each fi ∈ R[X1,...,Xm]. For a semialgebraic subset A ⊂ R , the preimage f −1(A) ⊂ Rm of A by f is semialgebraic. To see this, let l n A = ∪i=1Ai, where each Ai is of the form {x ∈ R | P (x) = 0,Q1(x) >

40 −1 −1 0,...,Qr(x) > 0}. Then we can write f (A) as the union of all f (Ai) = m {x ∈ R | P (f(x)) = 0,Q1(f(x)) > 0,...,Qr(f(x)) > 0}. Since each of the functions f1, . . . , fn are polynomial, and the functions P,Q1,...,Qr are polyno- −1 mial, the compositions P ◦ f and Qi ◦ f are polynomial. Thus the sets f (Ai) −1 −1 l l −1 are semialgebraic, and therefore f (A) = F (∪i=1Ai) = ∪i=1f (Ai) is also semialgebraic.

Example 15. If A ⊂ Rm, B ⊂ Rn are semialgebraic, then A × B ⊂ Rm × Rn m is semialgebraic. To see this, Assume that A = {x ∈ R | B1} and B = n {y ∈ R | B2}, where B1 and B2 are finite boolean combinations of polynomial equations and inequalities in (X1,...,Xm) and (Y1,...,Yn) respectively. Then m n the set A × B = {(x; y) ∈ R × R | B1(x) and B2(y)} is clearly also described by a finite boolean combination of polynomial equations and inequalities in m n (X1,...,Xm,Y1,...,Yn), and is therefore a semialgebraic subset of R × R .

2.2 Tarski-Seidenberg - Second form In the previous subsection we explored various stability properties of the class of semialgebraic sets. The main result of this subsection is that the class of semialgebraic sets is also closed under projection. This is the second form of the Tarski-Seidenberg principle, and the proof we give relies on the first form, Theorem 1.2.1. Theorem 2.2.1. Let S ⊂ Rn, and consider π : Rn → Rn−1, the canonical projection onto the first n − 1 coordinates. The projection, π(S) ⊂ Rn−1 is semialgebraic. Proof. From the previous subsection, we know that S is a finite union of sets of the form

0 n S = {(x1, . . . , xn) ∈ R | P (x) = 0,Q1(x) > 0,...,Qr(x) > 0}. n 0 n−1 Let us denote x := (x1, . . . , xn) ∈ R and x := (x1, . . . , xn−1) ∈ R . By The- orem 1.2.1 (Tarski-Seidenberg), there is a boolean combination C(X1,...,Xn−1) of polynomial equations and inequalities in the first n − 1 coordinates such that 0 0 0 the system “P (x ,Xn) = 0,Q1(x ,Xn) > 0,...,Qr(x ,Xn) > 0” has a real so- 0 lution if and only if C(x ) is satisfied. That is, there exists an xn ∈ R such that 0 0 0 P (x , xn) = 0,Q1(x , xn) > 0,...,Qr(x , xn) > 0 if and only if C(x0) is satisfied. Note that the projection π(S) is precisely the set of points

0 n−1 0 {x ∈ R | ∃xn ∈ R such that P (x , xn) = 0 and 0 0 Q1(x , xn) > 0,...,Qr(x , xn) > 0}.

That is, π(S0) = {x0 ∈ Rn−1 | C(x0)}, which is semialgebraic, as we saw in the previous subsection. Since S is simply a finite union of such S0, π(S) is clearly also semialgebraic.

41 The following results are immediate consequences of Theorem 2.2.1.

Corollary 2.2.2. The projection of a semialgebraic subset S ⊂ Rn+m onto any n coordinates is also semialgebraic. Proof. This follows by an easy induction on m. Reordering the coordinates if necessary, we eliminate the last variable m-times, following the proof of the above theorem each time.

Corollary 2.2.3. For any semialgebraic set S ⊂ Rn, the closure of S in Rn is semialgebraic. Proof. A detailed proof of this statement will not be included. The result fol- lows immediately from the fact that the distance function is semialgebraic, (cf. Example 20).

2.3 Tarski-Seidenberg - Third form We now develop a less geometric form of the Tarski-Seidenberg principle. First we introduce several definitions concerning formulae of the language of ordered fields with parameters in R, which we will use to describe semialgebraic sets. First-order formulae are ‘built’ using the following three rules: ˆ For any polynomial P ∈ R[X1,...,Xn], both “P = 0” and “P > 0” are first-order formulae. ˆ For any first-order formulae, say Θ and Φ,“Θ and Φ”, “Θ or Φ”, and “not Θ” are first-order formulae. (We will denote these by Θ ∧ Φ, Θ ∨ Φ, and ¬Θ respectively.)

ˆ For a first-order formula, Θ, and a variable, X ∈ R, both “there exists an X such that Θ”(∃XΘ), and “for all X, Θ”(∀XΘ), are first-order formulae. Formulae made by using only the first two rules are called quantifier-free for- mulae. Notice that a set {x ∈ Rn | Θ(x) ∧ Φ(x)} is simply the intersection n n {x ∈ R | Θ(x)} ∩ {x ∈ R | Φ(x)}.

Similarly, the sets {x ∈ Rn | Θ(x) ∨ Φ(x)} and {x ∈ Rn | ¬Θ(x)} are the union n n {x ∈ R | Θ(x)} ∪ {x ∈ R | Φ(x)}, and the complement n n R \{x ∈ R | Θ(x)}, respectively. Since Θ and Φ Hence, by the definition of semialgebraicity as in Proposition 2.1.1, a set S ⊂ Rn is semialgebraic if and only if there is a quantifier-free first-order formula Θ in n parameters such that

n S = {(x1, . . . , xn) ∈ R | Θ(x1, . . . , xn)}.

42 We show that a formula built using all three rules will also describe a semialge- braic set.

Theorem 2.3.1. Let Θ(X1,...,Xn) be a first-order formula. Then the set n {(x1, . . . , xn)R | Θ(x1, . . . , xn)} is semialgebraic. Proof. As above, the sets of points satisfying first-order formulae built using only the first rule are basic semialgebraic sets of the form {x ∈ Rn | P (x) = 0} and {x ∈ Rn | P (x) > 0}. The sets of points satisfying first-order formulae built using the first two rules are (finite) unions and intersections of the form {x ∈ n R | P (x) = 0,Q1(x) > 0,...,Qr(x) > 0}, and so they are also semialgebraic. Finally, we consider sets of points satisfying a general first-order formula (that is, including those built using the third rule). Let Θ(X1,...,Xn) be a first order formula such that S := {x ∈ Rn | Θ(x)} is semialgebraic. Then “∃XΘ” is a first-order formula, and the set

0 n−1 S := {(x1, . . . , xn−1) ∈ R | ∃xn ∈ R such that Θ(x1, . . . , xn)} is the projection of S onto the first n − 1 coordinates. By Theorem 2.2.1, S0 is semialgebraic. Similarly, “∀XΘ” is a first order formula, and it can be rewritten as “¬∃X¬Θ”. The set of points satisfying this formula,

n−1 {(x1, . . . , xn−1) ∈ R | ¬∃xn ∈ R such that ¬Θ(x1, . . . , xn)}, can be expressed as

n−1 n n n−1 n R \ (π (R \{(x1, . . . , xn) ∈ R | Θ(x1, . . . , xn)})) = R \ (π(R \S)) , where π : Rn → Rn−1 is the projection onto the first n − 1 coordinates, which is also semialgebraic. This proves the theorem. Remark: This form can be a powerful tool since first-order formulae can be used to describe sets which are otherwise difficult to define.

2.4 Semialgebraic functions We will now introduce the notion of semialgebraicity as a property of functions. Let X ⊂ Rm and Y ⊂ Rn be semialgebraic sets, and consider a function f : X → Y . The function f is said to be semialgebraic if its graph

m n Γf = {(x, y) ∈ X × Y | y = f(x)} ⊂ R × R is semialgebraic (as a subset of Rm × Rn). We give some examples of semialge- braic functions. Assume that the sets X and Y are as above for the following examples. Example 16. A polynomial function f : X → Y is semialgebraic. To see this, we write the graph as

Γf = {(x, y) ∈ X × Y | x ∈ X and y = f(x)},

43 which, for some boolean combination of polynomial equations and inequalities B describing X, can be written as

m n Γf = {(x, y) ∈ R × R | B(x) and y − f(x) = 0}.

The graph Γf is clearly a semialgebraic set, and hence f is a semialgebraic function. Alternatively, we could have written X as a finite union of sets of the form m {x ∈ R | P (x) = 0,Q1(x) > 0,...,Qr(x) > 0}, and proceeded in a similar way.

Example 17. A regular rational mapping f : X → Y is semialgebraic. To see this, we write the graph of f as   h1(x) hn(x) Γf = (x, y) ∈ X × Y | x ∈ X and y1 = , . . . , yn = , g1(x) gn(x) where the hi, gi ∈ R[X1,...,Xm] and gi(x) 6= 0 on X. Since X is semialgebraic, m it is a finite union of sets of the form {x ∈ R | P (x) = 0,Q1 > 0,...,Qr(x) > 0}. Assuming X itself is of this form, the graph Γf can be written as

m n Γf = {(x, y) ∈ R × R | P (x) = 0,Q1(x) > 0,...,Qr(x) > 0 and y1 · g1(x) − h1(x) = 0, . . . , yn · gn(x) − gn(x) = 0}, which is clearly a semialgebraic subset of Rm × Rn. Hence f is a semialgebraic function. Notice that the semialgebraicity of polynomials functions f :→ R follows as a particular case of this example.

Example 18.√ If f : X → R is a semialgebraic function with f(x) ≥ 0 for all x ∈ X, then f is also semialgebraic. To see this, consider the graph of f,

m Γf = {(x, y) ∈ R × R | x ∈ X and y − f(x) = 0}, which must√ be semialgebraic, since f is a semialgebraic function. Then the graph of f can be written as

√ m 2 Γ f = {(x, y) ∈ R × R | x ∈ X and y − f(x) = 0 and y ≥ 0}. √ This, however, does not show that the graph of f is semialgebraic since we do not know that f itself is polynomial. We consider the graph in a different form. Since Γf is a semialgebraic set, it is a finite union of sets defined by polynomial equations and inequalities. Assuming Γf itself is of this form:

m Γf = {(x, y) ∈ R × R | P (x, y) = 0,Q1(x, y) > 0,...,Qr(x, y) > 0}, where P,Q1,...,Qr ∈ R[X1,...,Xm,Y ], we can write

√ m 2 2 2 Γ f = {(x, y) ∈ R ×R | P (x, y ) = 0,Q1(x, y ) > 0,...,Qr(x, y ) > 0, y ≥ 0}.

44 √ √ This shows that Γ f is a semialgebraic set, and hence that f is a semialgebraic function.

Example 19. If f : X → R is a semialgebraic function, then |f(x)| is also semialgebraic. To see this, we write the graph Γf as a union of sets of the form

{(x, y) ∈ X × R | P (x, y) = 0,Q1(x, y) > 0,...,Qr(x, y) > 0}, each of which can be rewritten as

{(x, y) ∈ X × R | P (x, y) = 0, y ≥ 0,Q1(x, y) > 0,...,Qr(x, y) > 0} ∪ {(x, y) ∈ X × R | P (x, y) = 0, y < 0,Q1(x, y) > 0,...,Qr(x, y) > 0}. Replacing the second component of the above union with

{(x, y) ∈ X × R | P (x, −y) = 0, y < 0,Q1(x, −y) > 0,...,Qr(x, −y) > 0} gives us the graph of |f(x)|, which is clearly semialgebraic. Alternatively, we can use the fact that the square root of a non-negative semialgebraic function X → R is also semialgebraic, (which was shown in the previous example). Since f is semialgebraic, f 2 is also semialgebraic (see Propo- sition 2.4.4), and non-negative. Then |f(x)| = pf 2(x) must also be semialge- braic.

Example 20. A particularly important example is the distance function. That is, f(x) := dist(x, S), the distance from a semialgebraic set S ⊂ Rn, is a semialgebraic function. To show this we use the Tarski-Seidenberg principle in its third form, as in Theorem 2.3.1. Since S is semialgebraic, we assume that is has a description of the form S = {s ∈ Rn | Θ(s)}, for some first-order formula, Θ. We can then write the graph of f(x) as n n Γf(x) = {(x, z) ∈ R × [0, ∞) | ∀R ∈ [0, ∞), R > z, ∃s ∈ R such that   Θ(s) and R2 − (dist(x, s))2 ≥ 0 , and

n ∀r ∈ [0, ∞), r < z, @s ∈ R such that   Θ(s) and (dist(x, s))2 − r2 = 0 },

2 2 2 noting that (dist(x, s)) = (x1 − s1) + ... + (xn − sn) is polynomial. Hence the above graph is a set described by a first-order formula. That is, Γf is a semialgebraic set, and hence dist(x, S) is a semialgebraic function. A subtle point in showing that the distance function is semialgebraic is that, (although we did not write it explicitly), we took an infimum of a semialgebraic function over a semialgebraic set. To clarify, we can write the distance of a point from a semialgebraic set as dist(x, S) = inf {||x − s||}. s∈S We aim to show more generally that the infimum (or supremum) of a semialge- braic function over a semialgebraic set is also semialgebraic.

45 Proposition 2.4.1. Consider a semialgebraic function n m f : R × R −→ R (x, y) 7−→ f(x, y) := z.

Denote by {fy} the family of semialgebraic functions {f( · , y)} parameterized by y ∈ Rm, and let S ⊂ Rm be a semialgebraic set. Then the function n inf fs : R −→ R s∈S

x 7−→ inf fs(x), s∈S is semialgebraic. The function sup fs is also semialgebraic. s∈S Proof. We prove this result for the case of taking the infimum, which we do in a similar manner as in the previous example. Since S is semialgebraic, we can assume that there is a first-order formula Θ such that S = {s ∈ Rm | Θ(s)}. Since f is also semialgebraic, Γf must have a similar description (as a semi- algebraic subset of Rn × Rm × R). That is, the graph of f can be written as n m Γf = {(x, y, z) ∈ R × R × R | Φ(x, y, z)}. Then the graph of infs∈S fs(x) can be written as    n 0  0  Γ = (x, z) ∈ R × R | ∀R > z, ∃R < R, ∃s Θ(s) and Φ(x, s, R )    and @r < z ∃s(Θ(s) and Φ(x, s, r) .

More intuitively, this first-order formula describing Γ means that “For every R greater than z, there is some point s ∈ S such that f(x, s) < R, and for r less than z there are no such s ∈ S for which f(x, s) = r. Since a point (x, z) is in Γ if and only if this first-order formula is satisfied, Γ is a semialgebraic subset n+1 of R , and hence infs∈S fs(x) is a semialgebraic function. The proof of the supremum case is essentially the same.

Remark: In general, it is not enough for each of the fs to be semialgebraic. That is, we require f itself to be semialgebraic in order for infs∈S fs to be semialgebraic. We give an example to illustrate this. Example 21. Consider the semialgebraic function f : R × R → R defined by ( ey x = y f(x, y) = 1 otherwise which is not semialgebraic. Observe that, for each y ∈ R, the function fy(x) is semialgebraic, as we allow y to vary over R, (or in fact any semialgebraic subset of R). However, the function

inf fy : R −→ R y∈R

x 7−→ inf fy(x) y∈R

46 is equal to ex when x ∈ (−∞, 0), which is not semialgebraic. (It is clear that the exponential function is not semialgebraic. However, for further proof one can use the fact that, for any polynomial P ∈ R[X] there exists some a ∈ R such that P (x) < ex on (a, ∞), which violates Proposition 2.4.7.) We give another similar proposition.

Proposition 2.4.2. Consider a semialgebraic set S ⊂ Rn, and semialgebraic n functions f, g : S → R, and denote Gt := {x ∈ R | g(x) = t}. Then the function

F (t): R −→ R t 7−→ inf f(x) x∈Gt is semialgebraic. The function supx∈Gt f is also semialgebraic. Proof. Let Θ, Φ, and Ψ be first-order formulae such that S = {x ∈ Rn | Θ(x)}, n n Γg = {(x, t) ∈ R × R | Φ(x, t)}, and Γf = {(x, z) ∈ R × R | Ψ(x, z)}. Then the graph of F can be written as    2 0  0  ΓF = (t, z) ∈ R | ∀R, R > z ∃R < R, ∃x Θ(x), Φ(x, t), Ψ(x, R )    and ∀r < z @x Θ(x) and Ψ(x, z) , and is therefore a semialgebraic subset of R2. Hence F is a semialgebraic func- tion. The proof that supx∈Gt f is semialgebraic is essentially the same. We now observe some further consequences of the definition of semialgebraic functions.

Proposition 2.4.3. Let S ⊂ Rm and T ⊂ Rn be semialgebraic sets, and f : Rm → Rn a semialgebraic mapping. Then f(S) and f −1(T ) are both semi- algebraic sets.

Proof. The graph of f|S can be written as m n Γf|S = {(x, y) ∈ R × R | x ∈ S and y = f(x)}, where “x ∈ S” has an equivalent expression as a first-order formula (since S is semialgebraic). Then f(S) is simply the projection of the above graph onto the last n coordinates, which, by Theorem 2.2.1, is semialgebraic. Similarly, we aim to write f −1(T ) as a projection of some set. The graph of f restricted to f −1(T ) can be expressed as

m n {(x, y) ∈ R × R | f(x) = y and y ∈ T }, where the statements “f(x) = y” and “y ∈ T ” both have first-order formula equivalents. Thus the graph of f|f −1(T ) is a semialgebraic set. The inverse image f −1(T ) of T is hence the projection of the above graph onto the first m coordinates, which is clearly semialgebraic.

47 Proposition 2.4.4. Let A ⊂ Rl, B ⊂ Rm, and C ⊂ Rn be semialgebraic sets, and f : A → B and g : B → C semialgebraic mappings. Then the composition

g ◦ f : A → C is semialgebraic.

Proof. The graph of g ◦ f can be written as

Γg◦f = {(x, z) ∈ A × C | ∃y ∈ B such that ((x, y) ∈ Γf and (y, z) ∈ Γg)}, which is semialgebraic as it is the projection of the set

{(x, y, z) ∈ A × B × C | (x, y) ∈ Γf and (y, z) ∈ Γg}} onto the first l and last n coordinates. We also have the following stability property for semialgebraic functions. Proposition 2.4.5. The set of semialgebraic functions f from a semialgebraic set A ⊂ Rn to R form a ring. Proof. Consider semialgebraic functions f, g : A → R. Then the functions f − g and f · g are clearly semialgebraic, as they are polynomial combinations of outputs of semialgebraic functions. To be as explicit as possible, they can be understood using the maps

n n n fˆ : R × R −→ R × R ;(x, y) 7−→ (f(x), y),

n gˆ : R × R −→ R × R;(z, y) 7−→ (z, g(y)), and the diagonal {(x, x) ∈ A × A}, which are clearly semialgebraic. Finally, the function f − g (respectively f · g) is the composition of the semialgebraic function from R × R to R defined by (z, w) 7→ z − w (respectively (z, w) 7→ zw) withg ˆ ◦ fˆ, which, by Proposition 2.4.4, is semialgebraic.

Proposition 2.4.6. Consider U ⊂ Rn open and semialgebraic, and let f : U → be a semialgebraic function. If the partial derivatives ∂f of f exist on U, R ∂xi then they are semialgebraic.

Proof. The graph of f on U is semialgebraic, and hence is a finite union of sets of the form

n Γf ={(x, y) ∈ R × R | P (x, y) = 0,Q1(x, y) > 0,...,Qr(x, y) > 0, y = f(x)} where P,Q1,...,Qr ∈ R[X1,...,Xn,Y ], and y = f(x). We can rewrite P as a n+1 sum of monomials with multi-index α := (α1, . . . , αn, αn+1) ∈ Z+ . Denoting

48 0 α := (α1, . . . , αn), we have X α P (x, y) = aα · (x, y)

α1+...+αn+1=d 0 X α αn+1 = aαx y

α1+...+αn+1=d 0 X α αn+1 = aαx f(x) .

α1+...+αn+1=d n Denote ei := (0,..., 0, 1, 0,..., 0) ∈ Z+, where the i-th entry is the only nonzero entry. Then the partial derivative of P with respect to xi for i ∈ {1, . . . , n} is

∂P X ∂  0  (x, f(x)) = a xα f(x)αn+1 ∂x ∂x α i α i α0 ! X ∂x 0 ∂ = a f(x)αn+1 + xα (f(x)αn+1 ) α ∂x ∂x α i i   X 0 0 ∂f = a α xα −ei f(x)αn+1 + xα α f(x)(αn+1−1) α i n+1 ∂x α i ! X 0 X 0 ∂f = a α xα −ei f(x)αn+1 + a xα α f(x)αn+1−1 (x) α i α n+1 ∂x α α i ∂f =: B(x) + A(x) (x), ∂xi which is polynomial in x, f(x), and ∂f/∂xi. Since A and B are both composi- tions of semialgebraic functions (sums of powers of f multiplied by monomials and real constants), they are themselves, semialgebraic. Since P itself is poly- nomial in X1,...,Xn,Y , the partial derivatives ∂P/∂xi are also polynomial in X1,...,Xn,Y . Therefore, both P (X, f(X)) and ∂P (X, f(X))/∂xi are semial- gebraic, as a composition of a polynomial with f. Notice that the set

0 0 ∂P S := {(x, y ) ∈ U × R | A(x) · y + B(x) − (x, f(x)) = 0}, ∂xi is precisely the graph of ∂f/∂xi over U. Define

0 0 ∂P PS(x, y ) := A(x) · y + B(x) − (x, f(x)), ∂xi which vanishes on the graph of ∂f/∂xi. We know that the function PS is semialgebraic, thus we can assume the graph ΓPS is of the form 0 0 0 0 {(x, y , z) ∈ U × R × R | H(x, y , z) = 0,G1(x, y , z) > 0,...,Gs(x, y , z) > 0} 0 where P,G1,...,Gs ∈ R[X1,...,Xn,Y ,Z]. Finally, we have 0 0 2 2 0 S = ΓF |z=0 = {(x, y , z) ∈ U × R × R | H(x, y , z) + z = 0,G1(z, y , z) > 0, 0 ...,Gs(x, y , z) > 0}, which is the graph of ∂f/∂xi over U, is clearly semialgebraic.

49 For clarity we give a simple, low-dimension example in which we imitate each step in the above proof. Example 22. Consider the open ball U = {(x, y) ∈ R2 | x2 + y2 < 1}, and p the function f(x, y) = 1 − x2 − y2 on U. Clearly U ⊂ R2 and f : U → R are semialgebraic. The the graph of f over U is

3 2 2 2 2 2 Γf = {(x, y, z) ∈ R | 1 − x − y − z = 0 and 1 − x − y > 0 and z > 0}, and the partial derivates of f are ∂f −x ∂f −y = , = . ∂x p1 − x2 − y2 ∂y p1 − x2 − y2 To verify, the partial derivatives of f do exist on U, and are semialgebraic (as compositions of semialgebraic functions, with denominator non-vanishing on U). Following the notation of the proof of Proposition 2.4.6, the product P of all polynomials in the description of the graph of f over U is simply P (x, y, z) = 1 − x2 − y2 − z2, which is indeed a sum of monomials in x, y, z. Remembering that z = f(x, y) = p1 − x2 − y2, we write P as

X α P (x, y, z) = aα · (x, y, z) α = 1 − x2 − y2 − z2.

Taking the partial derivatives of P with respect to x and y, we have ∂P (x, y, z) ∂z ∂f = −2x − 2z = −2x − 2f(x, y) · (x, y), ∂x ∂x ∂x ∂P (x, y, z) ∂z ∂f = −2y − 2z = −2y − 2f(x, y) · (x, y) ∂y ∂y ∂y which are necessarily semialgebraic since P is a polynomial (which is the case for general semialgebraic f). For simplicity, we omit the arguments for the partial derivatives with respect to y as they are essentially the same as those for x. The set ∂P (x, y, f(x, y)) S := {(x, y, z0) ∈ U × | − 2x − 2f(x, y)z0 − = 0} x R ∂x ∂f is precisely the graph of ∂x over U. We wish to show that Sx is semialgebraic. Define ∂P (x, y, f(x, y)) P := −2x − 2f(x, y)z0 − , Sx ∂x which is known to be semialgebraic, and hence has a graph which is described by polynomials, say

Γ = {(x, y, z0, w) ∈ U × × | H(x, y, z0, w) = 0,G (x, y, z0, w) > 0, PSx R R 1 0 ...,Gs(x, y, z , w) > 0}.

50 In this example, we are able to explicitly compute H as

H(x, y, z0, w) = 4(z02)(1 − x2 − y2) − (w + 2x)2,

which was calculated by rearranging the equation PSx = 0 and squaring both sides. Notice that Sx is the intersection n o Γ ∩ (x, y, z0, w) ∈ 4 | w = 0 PSx R

 0 4 0 0 4 0  ∩ {(x, y, z , w) ∈ R | x > 0, −z > 0} ∪ {(x, y, z , w) ∈ R | − x >, z > 0} , where the last component of the above intersection arises due to the fact that we squared PSx in order to obtain H. Therefore Sx is semialgebraic, and hence ∂f ∂x is semialgebraic. For the final results of this section we give a bound on the growth rate of semialgebraic functions of one variable, followed by the Lojasiewicz inequality. Particularly, we see that for every semialgebraic function of one variable there is a monomial of some degree which is strictly larger on some unbounded interval.

Proposition 2.4.7. Consider a semialgebraic function f :(a, ∞) → R for some a ∈ R. There exists an N ∈ N such that |f(x)| ≤ xN on some unbounded interval (b, ∞), where b ≥ a.

Proof. Since f is semialgebraic, the graph Γf must be a semialgebraic subset of 2 R . That is, we can write Γf as a union of semialgebraic sets,

Γf = Γ1 ∪ ... ∪ Γr, where each Γi is of the form

2 Γi = {(x, y) ∈ R | Pi(x, y) = 0,Qi,1(x, y) > 0,...,Qi,li (x, y) > 0}, where Pi and the Qi,j are polynomials. Since Γf is a graph, it cannot contain any vertical segments of the form {x} × (y1, y2), and hence each of the polynomials Pi must have nonzero degree with respect to y. Define the product r Y P (x, y) := Pi(x, y), i=1 which must also have nonzero degree with respect to y. Hence, let

d d−1 P (x, y) = A0(x)y + A1(x)y + ... + Ad(x), where A0,...,Ad ∈ R[X], and d > 0. Since A0 is polynomial in x, we can choose c ∈ (a, ∞) large enough so that A0(x) does not vanish for any x > c. For fixed x, f(x) describes a subset of the roots of P (x, y) as a function of y. Hence, by Lemma 1.1.6, we know that  1/j Aj(x) |f(x)| ≤ max d . j∈{1,...,d} A0(x)

51 Let dj Aj(x) = aj,0x + ... + aj,dj for each j ∈ {0, . . . , d}, and let  1/j  1/k Aj(x) Ak(x) max d = d j∈{1,...,d} A0(x) A0(x) for some k ∈ {1, . . . , d}. Then, assuming ak,0 6= 0, we have

1/j 1/k 1/k  A (x)   a xdk   a  j x→∞ k,0 k,0 dk/kd0 max d −−−−→ d d = d x . j∈{1,...,d} A0(x) a0,0x 0 a0,0

Set N ∈ , N > dk , then we can take some b ≥ c so that |f(x)| ≤ xN for all N kd0 x > b. We can now prove the Lojasiewicz inequality, concluding this section. Theorem 2.4.8. Let S ⊂ Rn be a compact semialgebraic set. Let f and g be real-valued continuous semialgebraic functions on S such that f(x) = 0 =⇒ g(x) = 0 for all x ∈ S. Then there exists an N ∈ N and a constant C ≥ 0 such that |g(x)|N ≤ C|f(x)| for all x ∈ S. Proof. We begin the proof with some notation. For t > 0, define 1 G := {x ∈ S | |g(x)| = }. t t

Since g is continuous, Gt is closed in S, hence Gt is compact. Due to the con- dition of f vanishing only when g vanishes, f does not vanish on any nonempty Gt. Denote  1  M(t) := sup , x∈Gt |f(x)| 1 which always exists since Gt is compact and |f(x)| is continuous on each Gt. For Gt empty, define M(t) = 0. The graph of M is described as   1 1  (t, r) ∈ × | ∃x ∈ G such that = r and ∀y ∈ G , ≤ r R>0 R t |f(x)| t |f(y)|  or (Gt = ∅ and r = 0) which is semialgebraic. To check the semialgebraicity of ΓM , we show that it admits a first-order formula description. Let Θ, Φ, and Ψ be first-order 1 1 formulae such that |g(x)| = t ⇐⇒ Θ(x, t), |f(x)| = r ⇐⇒ Φ(x, r), and 1 |f(x)| ≤ r ⇐⇒ Ψ(x, r). Then we can express the graph of M as     ΓM = (t, r) ∈ R>0 × R | ∃x Θ(x, t) and Φ(x, r) and ∀yΘ(y, t), Ψ(y, r)   or @xΘ(x, t) and r = 0 .

52 Therefore M(t) is semialgebraic. Since M : (0, ∞) → R is semialgebraic, by Proposition 2.4.7 there exists some b > 0 and an N ∈ N such that |M(t)| ≤ tN 1 1 1 for all t > b. Now, t > b > 0 =⇒ 0 < t < b , and g(x) = t , so we have 1 N 0 < |g(x)| < b . Furthermore, t > b =⇒ |M(t)| ≤ t for some N ∈ N. By definition of M(t) we get

1 N 1 ≤ M(t) ≤ t = . |f(x)| |g(x)|N

1 Therefore, for all x ∈ S such that 0 < |g(x)| < b we have 1 1 ≤ . |f(x)| |g(x)|N

1 N The set {x ∈ S | |g(x)| ≥ b } is compact, and so |g(x)| /|f(x)| attains a maximum on this set. Let d denote this maximum, and set c := max(1, d). Then we have |g(x)|N ≤ c|f(x)| everywhere on S.

53 3 Decomposing semialgebraic sets

In this section we explore two methods of semialgebraic set decomposition, and we observe consequences of these decompositions, including implications on the dimension and number of connected components of semialgebraic sets. The first method is the cylindrical algebraic decomposition (which we will abbreviate by ”c.a.d.”), which will show that every semialgebraic set S can be decomposed into a finite collection of disjoint sets, each of which are semialgebraically home- omorphic, by some h, to open hypercubes of various dimension. This gener- alizes, in some way, the representation of semialgebraic subsets of R as finite unions of points and open intervals. Furthermore, we show that the decom- position can be ‘adapted’ so that any finite collection of semialgebraic subsets of S can also be composed of a finite union of the disjoint images (by h) of these open hypercubes. The decomposition is done by taking successive pro- jections which ‘forget’ the last coordinate until we are left with one variable, from which we rebuild the set in a way that guarantees the desired properties. This will be done algorithmically. Cylindrical decomposition is the typical way in which we decompose semialgebraic sets, and we develop the construction in an inductive manner. The second method for decomposing semialgebraic sets is via triangulation, whereby we show that every compact semialgebraic set S is semialgebraically homeomorphic to a finite simplicial complex, say K, and that the homeomorphism h : |K| → S can always be constructed such that any semialgebraic subset of S is a union of open simplices in K.

3.1 Cylindrical algebraic decomposition We first define what is meant by a semialgebraic homeomorphism. A semi- algebraic homeomorphism h is a continuous bijective semialgebraic mapping between semialgebraic sets, say h : S → T , with continuous inverse. Since h is semialgebraic, its graph Γh over S must be semialgebraic in S × T . Since h −1 is bijective, h must also be bijective, and hence the graph Γh−1 exists and is essentially the same as the graph Γh, (interchanging the coordinates of S and T ). By definition of h, we know that h−1 is continuous, and that (h−1)−1 = h, which is also continuous. Hence we know that h−1 is continuous, bijective, semialgebraic, and has continuous inverse. Therefore h−1 : T → S is also a semialgebraic homeomorphism. n A cylindrical algebraic decomposition of R is a collection C1,..., Cn where i each of these Ci is a finite partition of R into semialgebraic subsets Ci,1,...,Ci,li called cells, such that:

1. Every cell in C1 is either a point or an open interval.

2. For every cell C ∈ Ck, for every k ∈ {1, . . . , n − 1}, there are finitely many continuous semialgebraic functions

ξC,1, . . . , ξC,lC : X → R

54 such that ξC,1 < . . . < ξC,lC on C, and whose graphs cut the cylinder C × R such that each cell of Ck+1 is either one of the graphs,

0 0 AC,i := Γξi = {(x , xk+1) ∈ C × R | xk+1 = ξC,i(x )},

for i ∈ {1, . . . , lC }, or one of the bands between graphs,

0 0 0 BC,j := {(x , xk+1) ∈ C × R | ξC,j(x ) < xk+1 < ξC,j+1(x )},

for j ∈ {0, . . . , lC }. Here we let ξC,0 = −∞ and ξC,lC +1 = +∞.

We give an example of a c.a.d. of R2. 2 Example 23. We give descriptions of partitions C1 and C2 of R and R respectively. Let

C1 = {C1,...,C5} = {(−∞, 0), {0}, (0, 1), {1}, (1, ∞)}, noting that all of these cells are indeed either points or open intervals. We want to consider the continuous semialgebraic functions ξ over each cell in each partition up to Cn−1. In this case we have n = 2, so we only need to consider the ξ over cells in C . Let us define the ξ , . . . , ξ : C → by 1 Ci,1 Ci,lCi i R

ξC ,1(x) = 0, 2 √ √ ξC3,1(x) = − x, ξC3,2(x) = + x,

ξC4,1(x) = 1,

ξC5,1(x) = 0,

where each ξCi,j is a continuous semialgebraic function. Note that these func- tions are only required to be continuous over their corresponding cell, and that the functions over adjacent cells are not required to ‘match up’ at the cell boundaries. The cells of C2 over each of the C1,C2,C3,C4,C5 are

(−∞, 0) × R over C1,

{0} × (∞, 0), {(0, 0)}, {0} × (0, ∞) over C2, √ √ {(x, y) ∈ C3 × R | y < − x}, {(x, y) ∈ C3 × R | y = − x}, √ √ √ {(x, y) ∈ C3 × R | − x < y < + x}, {(x, y) ∈ C3 × R | y = + x}, √ {(x, y) ∈ C3 × R | y < − x} over C3,

{1} × (−∞, 1), {(1, 1)}, {1} × (1, ∞) over C4,

(1, ∞) × (−∞, 0), (1, ∞) × {1}, (1, ∞) × (1, ∞) over C5.

55 Figure 14: A c.a.d. of R2.

We now show a simple yet desirable result before looking at the possible use of a cylindrical algebraic decomposition. Proposition 3.1.1. Every cell of a c.a.d. of Rn is semialgebraically homeo- morphic to an open hypercube (0, 1)d, for some d ∈ {0, . . . , n}, where (0, 1)0 is a point.

n Proof. Let C1,..., Cn be a c.a.d. of R . The cells of C1 are either points or open intervals, and so they are clearly semialgebraically homeomorphic to either 0 1 (0, 1) or (0, 1) respectively. The proof proceeds by induction on k for Ck k partitions of R . Assume that all cells in Ci are semialgebraically homeomorphic to open hypercubes (0, 1)d, d ∈ {0, . . . , k}, for every i = 1, . . . , k, k < n. For the induction step, we use the fact that each cell C in Ck+1 is either a graph AC,j of one of the ξC,j : C → R, or one of the bands BC,j bounded by the graphs of ξC,j and ξC,j+1. Checking the case for AC,j first, we take the semialgebraic homeomorphism 0 hk+1 : C −→ AC,j ⊂ C × R 0 0 0 x 7−→ (x , ξC,j(x )).

56 Indeed this is a semialgebraic homeomorphism since the ξ are all continuous −1 0 0 semialgebraic functions over the corresponding cell, (and h maps (x , ξC,j(x )) 0 to x ). Now, in the case of a band BC,j, we take the semialgebraic homeomor- phism that maps a segment {x0} × (0, 1) ⊂ C × (0, 1) affinely onto the corre- sponding segment between ξC,j and ξC,j+1. Explicitly, we have 0 hk+1 : C × (0, 1) −→BC,j t − 1 (x0, t) 7−→(x0, + ξ ) for j = 0, l 6= 0, t C,1 C 0 0 0 (x , tξC,j+1(x ) + (1 − t)ξC,j(x )) for 0 < j < lC , t (x0, + ξ ) for j = l , l 6= 0, 1 − t C,lC C C 1 1 (x0, − ) for j = l = 0, 1 − t t C accounting for the cells below the first graph, between graphs, above the last graph, or in the absence of any graphs, respectively. The cell C ∈ Ck is already assumed to be semialgebraically homeomorphic to some open hypercube (0, 1)d, d 0 say hk((0, 1) ) = C. So we take hk+1 := hk+1 ◦ hk to be our semialgebraic d d+1 homeomorphism between either (0, 1) and AC,j, or between (0, 1) and BC,j, completing the proof. Remark: We aim to eventually make effective use of c.a.d. by developing an algorithm by which we can construct, for any semialgebraic subset of Rn, a c.a.d. such that S is a union of the cells of this c.a.d.. However, we need not decompose Rn in such a way in general. As in Example 23 above, we did not begin with any semialgebraic subset of R2 in mind, we simply produced a c.a.d. in its own right. We know that a semialgebraic set is a finite union of sets which are defined by a polynomial equation and polynomial inequalities. With the goal of pro- ducing a c.a.d. whose cells comprise such a semialgebraic set, we introduce the notion of adapting a c.a.d. to a family of polynomials. Consider a finite n family of polynomials (P1,...,Pr) ∈ R[X1,...,Xn]. A set S ⊂ R is said to be (P1,...,Pr)-invariant if P1,...,Pr all have constant sign on S. If, for some n c.a.d. C1,..., Cn of R , every cell in Cn is (P1,...,Pr)-invariant, the c.a.d. is said to be adapted to (P1,...,Pr). From this definition we can immediately see that a semialgebraic set, which is defined by a boolean combination of sign conditions on a finite family of polynomials, is a union of cells in Cn of a c.a.d. of Rn adapted to that family of polynomials. Combining this with the fact that the cells of a c.a.d. are semialgebraically homeomorphic to open hypercubes, we see that every semialgebraic set is a finite disjoint union of sets, each of which are semialgebraically homeomorphic to open hypercubes. Consider a first-order formula, say Θ, which is a boolean combination of n sign conditions on polynomials P1,...,Pr ∈ R[X1,...,Xn], and let S ⊆ R be the set of points satisfying Θ(x). Recall that the subset of Rn−1 satisfying the formula 0 ∃xnΘ(x , xn),

57 0 n−1 where x := (x1, . . . , xn−1) ∈ R , is the projection of S onto the first n − 1 coordinates. Recall also that the formula

0 ∀xnΘ(x , xn) is equivalent to 0 @xn¬Θ(x , xn), and the set of points x0 ∈ Rn−1, satisfying this formula is the complement of the projection of Rn\S onto the first n − 1 coordinates. Both of these sets are n clearly semialgebraic. Using a c.a.d. of R adapted to (P1,...,Pr), we are able to check whether such formulae are true or not simply by checking the cylinders over the cells comprising the sets of points satisfying the formulae. To check 0 0 n−1 if the formula “∃xnΘ(x , xn)” is true for a given point x ∈ R , (that is, to check whether the point x0 is in the set of points satisfying the formula), we simply need to check if any cells C ∈ Cn of the c.a.d. which make up the set S 0 0 are in the cylinder above the cell C ∈ Cn−1 in which point x resides. If there is a cell C ⊂ S in the cylinder over C0 3 x0 then the formula holds true - otherwise, 0 it is false. We can make a similar argument for the formula “∀xnΘ(x , xn). It is clear that this works for any point x0 ∈ Rn−1, however, the signs of the polynomials P1,...,Pr are constant on each cell C ∈ Cn, by definition of an adapted c.a.d.. Therefore, by the cylindrical arrangement of the cells in each 0 0 of the families, Ci, if the above formulae are true for a point c ∈ C , then they must be true for all x0 ∈ C0. Generalizing the above notions, consider a formula of the form

0 φk+1xk+1 . . . φnxn, Θ(x , xk+1, . . . , xn),

0 k where φk+1, . . . , ψn are either ‘∃’ or ‘∀’ and x = (x1, . . . , xk) ∈ R . By extend- ing the above arguments, we see that the set of points satisfying such a formula 0 is a union of cells C ∈ Ck of the c.a.d. adapted to P1,...,Pr. Hence, by con- sidering the corresponding combination of complements and projections of cells in each Cn,..., Ck+1, and by finding which cells C ∈ Cn (if any) that comprise the set S are ‘above’ the cell C0 3 x0, we are able to determine whether such a formula is true or false. Example 24. Consider the polynomial P (X,Y ) = 1−X2 −Y 2, and the c.a.d. of R2 described by

C1 = {C1,...,C5} = {(−∞, −1), {−1}, (−1, 1), {1}, (1, ∞)}, with the cells in C2 equal to or bounded by the graphs of the functions

ξC2,1(x) = 0, p 2 p 2 ξC3,1(x) = − 1 − x , ξC3,2(x) = + 1 − x ,

ξC4,1(x) = 0. This c.a.d. is adapted to P . Let Θ(x, y) be the formula P (x, y) > 0. We aim to test whether the formula “∃yΘ(x, y)” is true or false for various values of x.

58 The semialgebraic set S0 ⊂ R of points x satisfying the formula ∃yΘ(x, y) is the projection of the semialgebraic set S = {(x, y) ∈ R2 | Θ(x, y)} onto the first coordinate. By observation, S is the open disk of radius 1 centered at the origin in R2, hence S0 is the open interval (−1, 1). However, we wish to determine this without relying on the simplicity of this particular example. The (only) cell of the c.a.d. comprising S is the band BC3,1, between the graphs of ξC3,1 and ξC3,2. By construction of the c.a.d. (particularly, the cylindrical arrangement of these cells), we know that BC3,1 ⊆ C3 × R. Moreover, we know that the projection 0 of BC3,1 onto the first coordinate it precisely C3, meaning that S , the set of points satisfying the formula “∃yΘ(x, y)”, is precisely the cell C3 ∈ C1. In the above example we took for granted that we had a c.a.d. adapted to the polynomial in question. An obvious question to raise at this point is whether or not we can produce a c.a.d. adapted to an arbitrary finite family of polynomials. Indeed, this is always possible. We detail a method of construction in the following subsection.

3.2 Constructing a c.a.d. adapted to a finite family of polynomials

The continuous semialgebraic functions ξ : C → R on the cells of a cylindrical algebraic decomposition are the way in which we define the higher-order cells within the cylinders C × R. In the case of a c.a.d. of Rn adapted to a family of polynomials, say P1,...,Pr ∈ R[X1,...,Xn], each cell in Cn is (P1,...,Pr)- invariant. Therefore, when we rebuild the cells of Cn as sections of the cylinders over cells C ∈ Cn−1, the graphs of the ξC,j (and the bands between these graphs) must be (P1,...,Pr)-invariant. If we consider these polynomials as functions of 0 n−1 a single variable, Xn, with parameters x = (x1, . . . , xn−1) ∈ R , it becomes apparent that the graphs of the ξC,j must coincide, in some manner, with the 0 roots of the Pi(x ,Xn). Since the ξC,j are required to be continuous, we re- 0 0 quire the roots of the P (x ,Xn) to be continuous functions of x over each cell

C ∈ Cn−1. Furthermore, the strict ordering ξC,1 < . . . ξC,lC of these functions 0 demands that the multiplicity of the roots of each Pi(x ,Xn) remains constant over each cell C ∈ Cn−1. We wish to prove that such functions ξC,j describing the roots of the Pi exist, are semialgebraic, and are continuous. We first con- 0 sider the continuity of the roots of the polynomials Pi(x ,Xn) as functions of the parameters x0. For the following results we will assume that C ⊂ Rn−1 is a connected 0 0 semialgebraic set, and for any x = (x1, . . . , xn−1) ∈ C, we think of P (x ,Xn) 0 as a polynomial in Xn with parameters x . Lemma 3.2.1 (Continuity of roots). Assume that, for all x0 ∈ C the polynomial 0 0 0 P (x ,Xn) has degree d, and take c ∈ C such that the polynomial P (c ,Xn) has distinct roots α1, . . . , αk ∈ C with multiplicities m1, . . . , mk, respectively. Taking  > 0 small enough that the open disks B(αi, ) ⊂ C centered about 0 0 0 the roots of P (c ,Xn) are all disjoint, for x ∈ C sufficiently close to c , the

59 0 polynomial P (x ,Xn) has exactly mi roots (counted with multiplicities) in each of the corresponding disks B(αi, ). Proof. We first restrict ourselves to monic polynomials. Consider a polyno- d d−1 mial X + a1X + ... + ad ∈ C[X], which we can identify with the point d (a1, . . . , ad) ∈ C , (by associating the coefficients ai of each of the monomial terms in the polynomial (listed in lexicographical order) with the point ai ∈ C). We consider the operation of multiplying polynomials, and the corresponding r r−1 mapping in this ‘coefficient space’. Hence if we write A = X +a1X +...+ar d−r d−r−1 and B = X + b0X + ... + bd−r, we define the mapping

r d−r d µ : C × C −→ C (A, B) 7−→ AB as the multiplication of (monic) polynomials. The i-th coordinate of µ(A, B) is the coefficient of the order-(d − i) term of the polynomial resulting from the multiplication of A and B in the usual sense. That is, if we denote the product d d−1 r d−r AB =: C = X + c1X + ... + cd for some fixed A ∈ C and B ∈ C , then the k-th coordinate of µ(A, B) is

ck = bk + a1bk−1 + ... + ak−1b1 + ak, where ai = 0 for i > r and bi = 0 for i > d − r. With this identification of polynomials with points, we consider the jacobian determinant of µ at (A, B):

 1 0 ...... 0 1 0 ... 0  . .  ......   b1 . . . a1 . . .     ......   ...... 0     ......   . . . 0 . . 1    |J (A, B)| = det  .. .  . µ bd−r . 1 . a1    . .   0 .. b a .   1 r   ......   ...... 0 .. .     ......   ......  0 ...... 0 bd−r 0 ... 0 ar

In this case, the jacobian of µ is drawn as though r = d − r + 1, (that is, deg(A) = deg(B) + 1). With some straightforward permutations of columns,

60 the jacobian of µ at (A, B) becomes

 1 0 ... 0 0 ...... 0 1  . .  ......  a1 ...... b1     ......   . . . 0 . . . . .     ......   . . 1 0 . . .    J (A, B) =  . . .  , µ  . a1 1 . bd−r    . .  a . b . . 0   r 1   ......   0 ......     ......   ......  0 ... 0 ar bd−r 0 ...... 0 which is the transpose of the Sylvester matrix of A and B. Hence the jacobian determinant |Jµ(A, B)| is equal to ±PSRC0(A, B), the resultant of A and B. By Theorem 1.3.5, the resultant of A and B is zero if and only if A and B have a common factor of positive degree. Hence the jacobian determinant is nonzero if and only if A and B are relatively prime. Now, if we let C = AB as above, and assume that A and B are relatively prime, then the jacobian determinant of µ at (A, B) is nonzero, and hence by the inverse function theorem, µ is invertible with continuous inverse near C ∈ Cd, and so there are unique R ∈ Cr and S ∈ Cd−r such that µ(R,S) = Q ∈ Cd. That is, for Q ∈ Cd sufficiently close to C, there is a unique factorization Q = RS with R close to A and S close to B. If we then consider a polynomial m1 m P = (X − α1) ... (X − αk) k , with distinct roots α1, . . . , αkinC, then each mi of the factors (X − αi) are relatively prime to one another. Therefore, if we mi define Pi := (X − αi) for 1 ≤ i ≤ k, each of the inductively defined mappings

m1+...+mi−1 mi m1+...+mi µi : C × C −→ C  P1 ...Pi−1,Pi 7−→ P1 ...Pi

m1+...+mi are invertible near Pi ∈ C . Hence for Q sufficiently close to P , Q admits a unique factorization Q = Q1 ...Qk, where each of the Qi are close to mi Pi = (X − αi) . Consider the polynomial Xd, which identifies with the point 0 ∈ Cd, and its root α = 0 with multiplicity d, and another polynomial, say, Q = Xd + d−1 d 1X + ... + d identifying with the point (1, . . . , d) ∈ C . By Lemma 1.1.6, the roots of Q are bounded within the disk B(0, ) ⊂ C, where

1  = max (d|i|) i , i=1,...,d which is a continuous function of the i. Hence, for any  > 0, we can always find 1, . . . , d small enough that the roots of Q are in this disk. Using the d same argument, and by noting that the polynomial (Xα) is a translation of

61 Xd by a constant α ∈ C, we know that, for a disk B(α, ) ⊂ C of arbitrarily small radius about α, we can always find a polynomial sufficiently close to (X − α)d for its roots to lie within B(α, ). Now consider again the case of m1 m P = (X − α1) ... (X − αk) k . Denote by Bi := B(αi, ) ⊂ C the disk of radius  about αi for each of the roots αi of P , and take  > 0 so small that all Bi are disjoint. We know that a polynomial Q close enough to P has a mi unique factorization Q = Q1 ...Qk, where each Qi is close to (X − αi) , and mi that a polynomial sufficiently close to (X − αi) has its roots in the disk Bi. Therefore, if Q is monic, of degree d, and sufficiently close to P , each disk Bi contains exactly mi roots of Q (counted with multiplicities). 0 0 d 0 d−1 0 Finally, if we let P (c ,Xn) = a0(c )X + a1(c )X + ... + ad(c ) with de- gree d, but not necessarily monic, then we can simply replace P with P/a0, since a0 is nonzero by the assumption that P has degree d. The coefficients 0 0 of an arbitrary polynomial Q(x ,Xn) are themselves polynomial in x , and are therefore continuous. That is, the roots are continuous functions of the co- efficients of a polynomial of fixed degree, and the coefficients of an arbitrary 0 polynomial Q(x ,Xn) are polynomial (continuous) functions of the parameters x0 ∈ C, Hence we can apply the above arguments to obtain that any polyno- 0 0 0 mial Q(x ,Xn) of degree d with x close to c has its roots close to those of 0 P (c ,Xn).

0 Proposition 3.2.2. Let P (X1,...,Xn) ∈ R[X1,...,Xn] such that P (x ,Xn) has degree d and exactly k distinct roots for all x0 ∈ C. Then there exist con- tinuous semialgebraic functions

ξ1, . . . , ξl : C → R

0 0 for some l ≤ k, such that ξ1 < . . . < ξl everywhere on C, {ξ1(x ), . . . , ξl(x )} is 0 0 precisely the set of real roots of P (x ,Xn) for every x ∈ C, and the multiplicity of each root is constant on C.

0 Proof. We first show that the number of distinct real roots of P (x ,Xn) is 0 0 constant for x ∈ C. Assume that P (c ,Xn) has distinct roots α1, . . . , αk ∈ C 0 with multiplicities m1, . . . , mk for some c ∈ C, and let B(αi, ) be as in Lemma 0 3.2.1 above. Since we assume P (x ,Xn) to have fixed degree equal to d and exactly k distinct roots for all x0 ∈ C, by the continuity of roots (Lemma 3.2.1) 0 each B(αi, ) contains exactly one root, say ζi, of P (x ,Xn) of multiplicity mi 0 0 0 0 for x ∈ C sufficiently close to c . (Indeed, for x close to c , each Bi contains mi 0 roots of P (x ,Xn) counted with multiplicities, so that if some Bi contained more 0 than one distinct root of P (x ,Xn), some other Bj would contain no roots of 0 P (x ,Xn), which is a direct contradiction to the assumption of Lemma 3.2.1.) If αi is real then ζi must also be real, otherwise the complex conjugate ζi would be 0 another (distinct) root of P (x ,Xn) in Bi, which we know to be a contradiction. If αi is non-real then ζi must also be non-real, since the open disks B(αi, ) and 0 0 0 B(αi, ) are assumed to be disjoint. Hence for all x ∈ C close to c , P (x ,Xn) 0 has the same number, of real roots as P (c ,Xn), and since C is connected this number must be constant on C.

62 0 0 Let us assume that P (x ,Xn) has l distinct real roots for all x ∈ C, where l is obviously at most k, and with these real roots in increasing order, denote by

ξi : C → R

0 0 the function sending x ∈ C to the i-th real root of P (x ,Xn). By Lemma 3.2.1, 0 the ξi are continuous functions of x , and, as above, the connectedness of C im- 0 plies that the ξi(x ) have constant multiplicity. Finally, we give a semialgebraic description of each ξi. Let Θ be the first order formula such that C is the set of 0 n−1 0 x ∈ R satisfying Θ(x ). Then we can define the graphs of ξ1, . . . , ξl, relying on the strict ordering ξ1 < . . . < ξl. Particularly, the graph of ξi is the set

 0 n 0  (x , xn) ∈ R | Θ(x ) and ∃y1, . . . , yl ∈ R such that y1 < . . . yl and

0 0  P (x , y1) = ... = P (x , yl) = 0 and xn = yi , and is therefore semialgebraic. We now know that we can ensure the strict ordering and constant multiplicity of the continuous semialgebraic functions ξ : C → R describing the roots of a single polynomial when certain constraints are placed on the polynomial and the set C. However, in order to make the generalization of a c.a.d. adapted to a family of polynomials we must also pay attention to the multiplicities of common roots between different polynomials in the family. For example, if we consider a family of polynomials P1,...,Pr ∈ R[X1,...,Xn] such that the zero-sets of some Pi and Pj in (P1,...,Pr) intersect, then the cells of a c.a.d. n adapted to (P1,...,Pr) must be contained within the regions of R described by these zero-sets. We now show that we can keep track of this, given similar constraints on C and the polynomials with parameters in C, in the same way as in the above proposition.

n−1 Proposition 3.2.3. Let P,Q ∈ R[X1,...,Xn] and let C ⊂ R be a connected semialgebraic subset such that

0 0 • the number of distinct roots of P (x ,Xn) and Q(x ,Xn) are constant for all x0 ∈ C,

0 0 0 0 • the degrees of P (x ,Xn), Q(x ,Xn), and gcd(P (x ,Xn),Q(x ,Xn)) with 0 respect to Xn are constant for all x ∈ C,

Let ξ, ζ : C → R be continuous semialgebraic functions such that P (x0, ξ(x0)) = 0 and Q(x0, ζ(x0)) = 0 for all x0 ∈ C. If ξ(c0) = ζ(c0) for some c0 ∈ C, then ξ(x0) = ζ(x0) for all x0 ∈ C.

0 0 0 Proof. Assume that, for some c ∈ C, the product P (c ,Xn)Q(c ,Xn) has dis- 0 0 tinct roots α1, . . . , αk ∈ C, and that ξ(c ) = ζ(c ) = α1. Let mi denote the 0 multiplicity of αi as a root of P (c ,Xn), and ni denote the multiplicity of αi 0 as a root of Q(c ,Xn), noting that either mi or ni may be zero (but not both),

63 meaning that αi is not a root of the corresponding polynomial. Denote by 0 0 gi = min(mi, ni) the multiplicity of αi as a root of gcd(P (c ,Xn),Q(c ,Xn)), so 0 0 Pk that the degree of gcd(P (c ,Xn),Q(c ,Xn)) is i=1 gi. Again, we consider disks B(αi, ) ⊂ C about each root αi, with  > 0 so small that all of these disks are disjoint. Since the degree and number of distinct roots of P and Q are assumed to be constant on C, Lemma 3.2.1 gives us that for x0 ∈ C sufficiently close to c0, 0 each disk B(αi, ) contains exactly one root of P (x ,Xn) of multiplicity mi, and 0 exactly one root of Q(x ,Xn) of multiplicity ni. Similarly, each disk B(αi, ) for 0 0 which gi > 0 must also contain exactly one root of gcd(P (x ,Xn),Q(x ,Xn)) of 0 0 multiplicity gi. By the assumption that ξ(c ) = ζ(c ) = α1, the disk B(α1, ) 0 0 0 contains a root of gcd(P (x ,Xn),Q(x ,Xn)) of multiplicity g1 > 0 for all x sufficiently close to c0, and by the connectedness of C, this number must be constant for all x0 ∈ C.

We combine results of previous sections to construct a c.a.d. of Rn adapted to a family of polynomials in R[X1,...,Xn]. As stated at the beginning of this section, we aim to take successive co-dimension 1 projections until we are left with a family of polynomials in a single variable, from which we rebuild Rn according to the roots of the polynomials obtained from these projections. Again, we think of a polynomial P (X1,...,Xn) as a polynomial in Xn with parameters in X1,...,Xn−1. (The roots of the polynomials define the regions of Rn on which the polynomials have constant sign, as in the definition of a c.a.d..) Each projection takes us from a family of polynomials in R[X1,...,Xk] to a family of polynomials in R[X1,...,Xk−1]. We require each projection to preserve certain information about the family of polynomials, such as degree, number of distinct roots, and where the roots of different polynomials intersect. We make this more concrete with the definition of our projection operation, along with some brief discussion to follow. For a polynomial P ∈ R[X1,...,Xn], we denote by lc(P ) the leading coeffi- cient of P , and we denote the truncation of P by trunc(P ), (that is, P minus its leading term). Note that, for P (X1,...,Xn), its leading coefficient, lc(P ), is a polynomial in X1,...,Xn−1. For a family of polynomials (P1,...,Pr) in R[X1,...,Xn], and with Pi,Pj ∈ (P1,...,Pr), we define PROJ(P1,...,Pr) to be the smallest subset of R[X1,...,Xn−1] satisfying:

1. If degXn (Pi) =: di ≥ 2 then PROJ(P1,...,Pr) contains all nonconstant PSRCk(Pi, ∂Pi/∂Xn), for k = 1, . . . , di − 1.

2. If min(degXn (Pi), degXn (Pj)) =: d ≥ 1 then PROJ(P1,...,Pr) contains all nonconstant PSRCk(Pi,Pj), for k = 1, . . . , d.

3. If degXn (Pi) ≥ 0 and lc(Pi) is not constant then PROJ(P1,...,Pr) con- tains lc(Pi) and PROJ(P1,..., trunc(Pi),...,Pr), replacing Pi with its truncation, trunc(Pi).

Note that if degXn (Pi) = 0 and Pi is not constant then lc(Pi) = Pi, and trunc(Pi) = 0. The purpose of this projection operation can be made apparent

64 by recalling some facts. Rule 1 keeps track of when the multiplicity of roots of Pi changes. Indeed, Theorem 1.3.5 asserts that the greatest common divisor of Pi and ∂Pi/∂Xn has degree > k if and only if PSRC0(Pi, ∂Pi/∂Xn) = ... = PSRCk(Pi, ∂Pi/∂Xn) = 0. Hence, keeping track of where these principal sub- resultant coefficients vanish also allows us to deduce when the multiplicity of a root of Pi may change. Rule 2 keeps track of when Pi and Pj have common roots and when the multiplicity of these roots may change. Again, we refer to Theorem 1.3.5 for this verification. Both 1 and 2 rely on the degree of the Pi being constant. As such, rule 3 keeps track of when lc(Pi) vanishes, and hence tracks when the degree of Pi changes, and thereby keeps track of when the to- tal number of roots changes. In the case of degXn (Pi) = 0, we are essentially tracking when Pi has no roots. We capture the importance of this projection operation with the following theorem.

Theorem 3.2.4. Let (P1,...,Pr) ⊂ R[X1,...,Xn] a finite family, and let C ⊂ n−1 R be a connected, semialgebraic, PROJ(P1,...,Pr)-invariant subset. Then there exist continuous semialgebraic functions

ξ1, . . . , ξl : C → R

0 0 0 such that ξ1 < . . . < ξl everywhere on C, and for every x ∈ C, {ξ1(x ), . . . , ξl(x )} 0 0 is the set of real roots of all nonzero polynomials P1(x ,Xn),...,Pr(x ,Xn). The graph Ai of each ξi, and each band Bi of the cylinder C × R bounded by ξi and n ξi+1, are connected semialgebraic PROJ(P1,...,Pr)-invariant subsets of R , each semialgebraically homeomorphic to C and C × (0, 1), respectively.

Proof. We know that the continuous semialgebraic functions ξ1 < . . . < ξl : C → R describing the roots of each P1,...,Pr exist by Propositions 3.2.2. It remains to show that the collection of ξ1, . . . , ξl describing all roots of P1,...,Pr collec- tively are ordered strictly. Since C is PROJ(P1,...,Pr)-invariant the leading, we know that every polynomial P1,...,Pr (or the truncation of a polynomial) has fixed degree on C, and hence a fixed number of roots in C. By Proposition 0 3.2.2, every polynomial Pi(x ,Xn) has its real roots described by continuous semialgebraic functions ξ1 < . . . < ξli : C → R, where each of these ξj has fixed multiplicity on C. Furthermore, for any pair P,Q ∈ (P1,...,Pr), the polyno- mials PSRCk(P,Q) have constant sign on C for all valid k, and so the degree of gcd(P,Q) is constant on C. Proposition 3.2.3 implies that any ξj describing a 0 0 0 common root of P (c ,Xn) and Q(c ,Xn) at some c ∈ C must, in fact, describe a common root of P and Q everywhere in C - common roots remain common on C. Any nonconstant P ∈ (P1,...,Pr) with degXn (P ) = 0 is a polynomial in X1,...,Xn−1, and so will either vanish everywhere on C, or have no roots over C. In either case, there are no ξi : C → R describing the roots of such P . Note that due to the connectedness of C, the above multiplicities of roots in the above arguments remain constant on all of C. We conclude that there is a strict ordering of the continuous semialgebraic functions ξ1 < . . . < ξl : C → R describing the (real) roots of (P1,...,Pr) which holds everywhere on C. Finally, by the constructions in Proposition 3.1.1, the Ai and Bi are semialgebraically

65 homeomorphic to C and C × (0, 1), respectively, and connectedness of the Ai and Bi immediately follows.

We can now directly address the construction of a c.a.d. of Rn adapted to a family (P1,...,Pr) ⊂ R[X1,...,Xn]. The proof of the following theorem is an explicit discussion of how to apply the theory explored in this section so far.

Theorem 3.2.5. For every family of polynomials (P1,...,Pr) in R[X1,...,Xn], n there exists a c.a.d. of R adapted to (P1,...,Pr).

Proof. Observe that PROJ(P1,...,Pr) is a (finite) family of polynomials in R[X1,...,Xn−1]. Indeed, iterating this operation n − 1 times, we have n−1  PROJ (P1,...,Pr) = PROJ ... PROJ(P1,...,Pr) ⊂ R[X1]. Using methods from Section 1, we are able to count and isolate the real roots of n−1 all polynomials in PROJ (P1,...,Pr). That is, we can produce a collection of subsets of R, say C1, whose elements are points and open intervals, which are sim- n−1 ply the roots of polynomials in PROJ (P1,...,Pr) and the intervals between these roots, respectively. Each of these points and open intervals are clearly n−1 connected semialgebraic subsets of R, which are also PROJ (P1,...,PR)- n−2 invariant by construction. If we consider PROJ (P1,...,Pr) ⊂ R[X1,X2] as a family of polynomials in its own right, we can rephrase the above observation as: n−1  the elements of C1 are connected semialgebraic PROJ PROJ (P1,...,Pr) - invariant subsets of R. By Theorem 3.2.4 there exist finitely many continuous semialgebraic functions ξC,1 < . . . < ξC,lC from each C ∈ C1 into R, such that {ξC,1(x1), . . . , ξC,l(x1)} is the set of real roots of all nonzero polynomials n−2 P (x1,X2) in PROJ (P1,...,Pr) for every x1 ∈ C. Furthermore, the graphs AC,i of these ξC,i, and the bands BC,i bounded between ξC,i and ξC,i+1, are semialgebraically homeomorphic to C and C × (0, 1), respectively. For instance, 2 if C1 3 C = c, a point in R, then the AC,i are simply points (c, ξC,i(c)) ∈ R , and the BC,i are vertical line segments {c}×(ξC,i(c), ξC,i+1(c)). If C1 3 D = (a, b), an open interval in R, then the AD,i are graphs {(x1, x2) ∈ D × R | x2 = ξD,i(x1)}, and the BD,i are open regions {(x1, x2) ∈ D × R | ξD,i(x1) < x2 < ξD,i+1(x1)}. In particular, Theorem 3.2.4 ensures that we can construct a c.a.d. of R2 n−2 adapted to PROJ (P1,...,Pr). We continue this process inductively. Assume that we have a C1,..., Cn−1, a n−1 cylindrical algebraic decomposition of R , where Cn−1 is a collection of con- n−1 nected, semialgebraic, PROJ(P1,...,Pr)-invariant subsets of R . Applying Theorem 3.2.4, for every cell C ∈ Cn−1 there exist finitely many continuous semi- 0 0 algebraic functions ξC,1 < . . . < ξC,lC : C → R such that {ξC,1(x ), . . . , ξC,lC (x )} 0 is the set of real roots of all nonzero polynomials P (x ,Xn) in PROJ(P1,...,Pr) 0 for every x ∈ C. The graphs of these ξC,i cut the cylinder C × R into (P1,...,Pr)-invariant subsets, namely the graphs themselves, AC,i, and the bands between them, BC,i, where each AC,i is semialgebraically homeomorphic to C, and each BC,i is semialgebraically homeomorphic to C × (0, 1). Hence we n have constructed a c.a.d. of R adapted to (P1,...,Pr).

66 Corollary 3.2.6. For any semialgebraic subset S ⊂ Rn, and any f : S → R s semialgebraic, there exists a partition S = ∪i=1Si such that the restriction of f to each Si is continuous.

Proof. Since f : S → R is semialgebraic, the graph Γf of f is a semialgebraic subset of Rn+1, and is therefore a finite disjoint union of sets of the form

n {(x, y) ∈ R × R | P (x, y) = 0,Q1(x, y) > 0,...,Qr(x, y) > 0}, where P,Q1,...,Qr are polynomials in X and Y . By Theorem 3.2.5 there exists n+1 n a c.a.d. of R adapted to P,Q1,...,Qr. Let Cn be the (finite) partition of R in this c.a.d., so that each cell C ∈ Cn is a PROJ(P,Q1,...,Qr)-invariant subset of Rn, and each cylinder C × R is cut by the graphs of continuous semialgebraic functions ξC,1 < . . . < ξC,lC : C → R into sets AC,i and BC,i, the graphs of the ξC,i, and bands between these graphs, respectively. The graph Γf , as a n+1 semialgebraic subset of R , must be a union of these AC,i and BC,i over each cell C ∈ Cn. However, as f is a function, C → R, the graph can contain at n most one point over each x ∈ C ⊂ R . Therefore Γf is a union of at most one of the AC,i over each C ∈ Cn, and each AC,i is the graph of a continuous semialgebraic function, ξC,i. We set the Si to be the C ∈ Cn on which the graph n of f is nonempty. As Cn is a finite partition of R , we have a finite partition s S = ∪i=1Si such that f|Si is continuous for each i = 1, . . . , s.

We give several examples of c.a.d. construction to highlight the important theoretical points of Theorem 3.2.5. Example 25. Let P (X,Y ) = X2 + Y 2 − 1 and Q(X,Y ) = X2 + Y 3 − 1, both polynomials in R[X,Y ]. We aim to produce a c.a.d. of R2 adapted to P and Q by computing PROJ(P,Q), and producing a c.a.d. of R adapted to this family. That is, we need to compute all principal subresultant coefficients of P and ∂P/∂Y , Q and ∂Q/∂Y , and of P and Q, as well as considering whether or not P and Q have constant leading coefficients. Clearly the leading coefficients (with respect to Y ) of P and Q are both constant. With ∂P/∂Y = 2Y and ∂Q/∂Y = 3Y 2, we compute the principal subresultant coefficients:

1 0 X2 − 1 0 0  0 1 0 X2 − 1 0   2  PSRC0(P,Q) = det 0 0 1 0 X − 1   0 1 0 0 X2 − 1 1 0 0 X2 − 1 0 = X2(X2 − 1)2 1 0 X2 − 1 PSRC1(P,Q) = det 0 1 0  1 0 0 = −(X2 − 1)

67 1 0 X2 − 1 PSRC0(P, ∂P/∂Y ) = det 0 2 0  2 0 0 = −4(X2 − 1) 1 0 0 X2 − 1 0  0 1 0 0 X2 − 1   PSRC0(Q, ∂Q/∂Y ) = det 0 0 3 0 0    0 3 0 0 0  3 0 0 0 0 = −27(X2 − 1)2, where we have omitted the computation of the principal subresultant coefficients that are constant. We have found that

PROJ(P,Q) = X2(X2 − 1)2,X2 − 1, −4(X2 − 1), −27(X2 − 1), which we may reduce to X2(X2 − 1)2,X2 − 1 as three of the members are constant multiples of one another. We now find the roots of the polynomials in this family, which is quite easy to do in this example, although in general we can employ methods such as Sturm’s root counting algorithm for this. The roots of X2(X2 − 1)2 are 0, ±1, and the roots of X2 − 1 are ±1. Hence the set

C1 = {C1,C2,C3,C4,C5,C6,C7} = {(∞, −1), {−1}}, (−1, 0), {0}, (0, 1), {1}, (1, ∞)} forms the partition of R1 for our c.a.d., and these sets are all PROJ(P,Q)- 2 invariant. We rebuild R from the cylinders Ci × R over each of these cells by taking test points ci in each of these Ci, and computing the number of distinct real roots of the family (P,Q) at this test point. (To reiterate, we are treating P and Q as polynomials in Y with parameters ci ∈ Ci ⊂ R.) For instance, if we 2 let c1 = −2, then P (c1,Y ) = P (−2,Y ) = 3 + Y , which has no real roots, and 3 Q(c1,Y ) = Q(−2,Y ) = 3+Y , which has 1 real root (a triple root). Continuing this for each cell, we find that P (x, Y ) has no real roots over C1 and C7, 1 real root over C2 and C6, and 2 real roots over C3, C4, and C5, while Q(x, Y ) has

1 real root over each cell. The graphs of the functions ξCi,j describing the real roots of P and Q are shown below, which are precisely the ACi,j as in te definition of a c.a.d., while the bands, BCi,j are the regions of the plane bounded between these graphs. For instance, P (x, y) > 0 for all x ∈ C1 = (−∞, −1), 2 1/3 and for all y ∈ R, while, for x ∈ C1, Q(x, y) > 0 when y > (1 − x ) and Q(x, y) < 0 when y < (1 − x2)1/3.

68 2 Figure 15: Partition C2 of c.a.d. of R adapted to P and Q.

The cells C2,C4,C6 ∈ C1, and hence the cylinders C2 × R, C4 × R, and C6 × R, arise due to the roots of X2 −1 and X2(X2 −1). While these are the roots of the principal subresultant coefficients of P and Q, it should be noted that the cells C2 and C6 actually arise even in the absence of Q. That is to say that a c.a.d. of R2 adapted to P would still exhibit cylinders {−1} × R and {1} × R. This is due to the fact that the number of distinct real roots of P (as a polynomial in Y ) change as the parameter x moves over the points −1 and 1. The cell C4, however, arises only due to an intersection of the roots of P and Q, which is detected as a root of the resultant of P and Q. To conclude this example, we point out a shortcoming of the information given by this c.a.d.. Particularly, we cannot define the cells of this c.a.d. by boolean combinations of sign conditions on the polynomials used. For instance, the subset of R2 satisfying “X2 − 1 < 0 and X2(X2 − 1) < 0 and P > 0 and Q < 0” is actually the union of 4 cells: C3,1 ∪ C3,5 ∪ C5,1 ∪ C5,5.

Example 26. Consider P (X,Y,Z) = XYZ − X3 − Y 2, a polynomial in R[X,Y,Z]. We aim to produce a c.a.d. of R3 adapted to P . First we must com- pute the family PROJ(P ) ⊂ R[X,Y ]. That is, we need to compute all principal

69 subresultant coefficients of P and ∂P/∂Z, and consider whether or not P has a nonconstant leading coefficient. Clearly the leading coefficient (with respect to Z), lc(P ) = XY , is not constant, and therefore XY and PROJ(trunc(P )) are contained in PROJ(P ). Since trunc(P ) = −X3 − Y 2 is constant with respect to Z, PROJ(trunc(P )) = trunc(P ). With ∂P/∂Z = XY a degree 0 polynomial with respect to Z, there are no principal subresultant coefficients to compute. That is, PROJ(P ) = (−X3 − Y 2,XY ). Let us denote

3 2 P1(X,Y ) := −X − Y ,P2(X,Y ) = XY.

Then we compute PROJ(PROJ(P )) = PROJ(P1,P2). First notice that lc(P2) = X is nonconstant, and that trunc(P2) = 0. There- fore X ∈ PROJ(P1,P2). With ∂P1/∂Y = −2Y and ∂P2/∂Y = X, we compute principal subresultant coefficients:

−1 0 −X3 PSRC0(P1,P2) = det  0 X 0  X 0 0 = X5

PSRC1(P1,P2) = X −1 0 −X3 PSRC0(P1, ∂P1/∂Y ) = det  0 −2 0  −2 0 0 = 4X3.

5 3 Then we have PROJ(P1,P2) = (X , 4X ,X) whose only roots are at 0, giving 1 a simple partition C1 = {C1,C2,C3} = {(−∞, 0), {0}, (0, ∞)} of R , where each of the cells C1,C2,C3 are PROJ(P1,P2)-invariant. For any√x ∈ C1 = (−∞, 0), 3 2 3 the polynomial P1(x, Y ) = −x − Y has 2 real roots at ± −x , (noting that 3 −x > 0 for x ∈ C1), and P2(x, Y ) = xY has 1 real root at 0. Therefore, there exist continuous semialgebraic functions

ξC1,1 < ξC1,2 < ξC1,3 : C1 → R

describing these real roots of P1 and P2 over C1. Similarly, at x = 0, the 2 polynomial P1(x, Y ) = −Y has a single real root at Y = 0, and P2(x, Y ) = 0 for all Y . Therefore, there exists a single continuous semialgebraic function

ξC2,1 : C2 → R describing the root of P1 over C2. Finally, for any x ∈ C3 = (0, ∞), P1(x, Y ) has no real roots, and P2(x, Y ) has a real root at 0. Therefore, there exists a single continuous semialgebraic function ξC3,1 : C3 → R describing the root of P2 over C3. 2 Now, there are 13 cells in our partition C2 of R . Namely, there are 7 cells

70 C1,1,...,C1,7 in the cylinder C1 × R:

C1,1 = {(x, y) ∈ C1 × R | y < ξC1,1(x)}

C1,2 = {(x, y) ∈ C2 × R | y = ξC1,1(x)}

C1,3 = {(x, y) ∈ C3 × R | ξC1,1(x) < y < ξC1,2}

C1,4 = {(x, y) ∈ C4 × R | y = ξC1,2}

C1,5 = {(x, y) ∈ C5 × R | ξC1,2(x) < y < ξC1,3}

C1,6 = {(x, y) ∈ C6 × R | y = ξC1,3(x)}

C1,7 = {(x, y) ∈ C7 × R | y > ξC1,3(x)}},

which are the graphs AC1,i of the ξC1,i on C1, and the bands BC1,j of the cylinder 2 C1 × R bounded between these graphs. This partition of R is shown below.

2 Figure 16: C2 - a partition of R into PROJ(P )-invariant cells.

There are also 3 cells over C2:

C2,1 = {0} × (−∞, 0)

C2,2 = {0} × {0}

C2,3 = {0} × (0, ∞),

71 and 3 cells over C3:

C3,1 = (0, ∞) × (−∞, 0)

C3,2 = (0, ∞) × {0}

C3,3 = (0, ∞) × (0, ∞).

We now check the number of distinct real roots of P over each cell in C2. For any 3 2 3 2 (x, y) ∈ C1,1, P (x, y, Z) = xyZ −x −y has exactly 1 real root at (x +y )/xy. In fact, for a point (x, y) in the cells C1,1,C1,2,C1,3,C1,5,C1,6,C1,7,C3,1,C3,3, the polynomial P (x, y, Z) has exactly 1 real root, which is at (x3 + y2)/xy, and hence C3 contains 3 cells in the cylinder over each of these cells. For points in the cells C1,2 and C1,6, the root of P (x, y, Z) is simply at z = 0. For a point (x, y) in cells C1,4,C2,1,C2,3,C3,2, P (x, y, Z) has no real roots, and so C3 contains 1 cell in the cylinder over each of these. Finally we consider the cell C2,2, over which P (0, 0,Z) = 0, and C3 contains a single cell over C2,2 = {0} × {0}. Therefore 3 the partition C3 of R consists of 29 cells in total.

Figure 17: Subset of R3 on which P = 0.

The cells C1,4,C3,2,C2,1,C2,2,C2,3 ∈ C2 arise due to the roots of XY . In par- ticular, this is the leading coefficient of P as a polynomial in Z. Indeed, P has

72 degree zero over these cells. Furthermore, in the cylinders over each of these cells other than C2,2, the polynomial P has degree zero and is non-vanishing as 3 2 the (relatively) constant term −X −Y is nonzero. However, over the cell C2,2, the polynomial P is identically zero as both the leading coefficient and constant term are zero here. Here, we observe an inability to deduce relative arrangement of cells in adjacent cylinders. This arises in particular due to the vanishing of the leading coefficient. Returning to Example 25 we see what appears to be a similar phenomenon, as the cells over C1 give us little information about the arrangement of cells over C2, (ignoring the influence of Q on the cells present). The difference, however, is that the topology of the unit circle can be deduced from the information given by the c.a.d. adapted to X2 + Y 2 − 1 - the functions

ξCi,j over C3 and C5 obviously extend continuously to C2 and C6. However, the topology of the surface XYZ − X3 − Y 2 = 0 cannot be recovered from the information given by the adapted c.a.d. - the polynomial is positive in all cells in the cylinder C1,4 × R, negative in all cells over C2,1, C2,3, and C3,2, while it is identically zero in all cells over C2,2, and has sign changes over the remaining cells. Now let P (X,Y,Z) = XYZ − X3 − Y 2, but take the variables in the order (Z,X,Y ), so that the projection operations are also applied in this order. Note that the leading coefficient of P , as a polynomial in Y , is −1, and so, in order to compute PROJ(P ) we only need to the principal subresultant coefficients of P and ∂P/∂Y = −2Y + XZ:

−1 XZ −X3 PSRC0(P, ∂P/∂Y ) = det  0 −2 XZ  −2 XZ 0 = 4X3 − X2Z2 = X2(4X − Z2).

3 2 2 3 2 2 Therefore PROJ(P ) = {4X − X Z }. Denoting P1(X,Z) := 4X − X Z , and remembering that we treat P1 as a polynomial in X with parameter Z, we have lc(P1) = 4, which is constant. Hence to compute PROJ(P1) we only need to compute the principal subresultant coefficients of P1 and ∂P1/∂X = 12X2 − 2XZ2, as follows:  4 −Z2 0 0 0  0 4 −Z2 0 0  2  PSRC0(P1, ∂P1/∂X) = det  0 0 12 −2Z 0    0 12 −2Z2 0 0 12 −2Z2 0 0 0 = 0  4 −Z2 0  2 PSRC1(P1, ∂P1/∂X) = det  0 12 −2Z  12 −2Z2 0 = 8Z4.

73 4 4 Therefore PROJ(PROJ(P )) = PROJ(P1) = {8Z }. Clearly 8Z has a single (quadruple) root at 0, and so the partition of R1 in our c.a.d. of R3 adapted to P is C1 = {C1,C2,C3} = {(−∞, 0), {0}, (0, ∞)}. Each of these Ci are PROJ(P1)- invariant calls, and so we can rebuild the sets of C2 as regions of the cylinders

Ci × R bounded by the graphs of continuous semialgebraic functions ξCi,j : Ci → R. For z ∈ C1 = (−∞, 0) and C3 = (0, ∞), the polynomial P1(z, X) = 2 2 2 X (4X − z ) has real roots 0 and z /4. For z ∈ C2 = {0}, the polynomials 3 P1(z, X) = 4X has a single root at 0. Thus for i = 1, 3 we have continuous semialgebraic functions ξCi,1 < ξCi,2 : Ci → R, which we are able to explicitly define by

ξCi,1(z) = 0 z2 ξ (z) = , Ci,2 4

and ξC2,1(z) = 0 over C2 (a point). These functions define regions of each cylinder Ci × R, which are the cells comprising C2 in our c.a.d..

2 Figure 18: C2 - a partition of R into PROJ(P )-invariant cells.

The cells of C2 shown in Figure 18 are connected, semialgebraic, PROJ(P )- invariant subset of R2. For points (z, x) in the cells C1,1, C1,3, C2,1, C3,1, and√C3,3, the polynomial P (z, x, Y ) = −Y 2 + zxY − x3 has 2 roots at y = (xz ± x2z2 − 4x3)/2. Over 2 the sets C1,2, C2,2, and C3,2, the roots of the polynomial P (z, x, Y ) = −Y 2 coincide at 0, while, for points (z, x)√ in C1,4 and C3,4, where x = z /4, the roots of P (z, x, Y ) coincide at y = (xz ± x2z2 − 4x3)/2 = xz/2. Finally, P (z, x, Y ) has no real roots over C1,5, C2,3, and C3,5.

74 3 Figure 19: Cells in C2 of c.a.d. of R .

For clarity and convenience we show a schematic collection of slices of the surface P = 0 at several z values, which are also shown in Figure 19 above.

Figure 20: Slices of the surface P = 0 at z = −2, 0, 3.

75 When taking the variables in the order (Z,X,Y ) we avoid the case of noncon- stant (and hence vanishing) leading coefficients seen when taking the variables in the order (X,Y,Z). As shown in Figure 20, the functions ξCi,j over the cells C1,3 and C1,1 extend continuously to the cylinders over C1,2 and C1,4, (and the functions over C2,1, C3,1, and C3,3 extend continuously to adjacent cylinders). In contrast to the first part of this example, the c.a.d. now gives us enough information to recover the topology of the surface P = 0, simply by changing the order in which we take the projections. This is addressed in the next sub- section. We explore an improvement to the c.a.d. construction which allows us to overcome this in the next subsection. We obtain the following as a consequence of the Theorem 3.2.5 and the nature of cell construction in a c.a.d.. Theorem 3.2.7. Every semialgebraic set is the disjoint union of finitely many semialgebraic subsets, each of which are C∞ submanifolds semialgebraically dif- feomorphic to an open hypercube (0, 1)d.

Proof. Every semialgebraic subset S ⊂ Rn is the (disjoint) union of finitely many sets of the form

n {x ∈ R | P (x) = 0,Q1(x) > 0,...,Qr(x) > 0}, where P,Q1,...,Qr ∈ R[X1,...,Xn]. By Theorem 3.2.5, each set of this form is n a finite, disjoint union of cells of a c.a.d. of R adapted to (P,Q1,...,Qr). This n c.a.d. is a collection of partitions C1,..., Cn of R,..., R , respectively. Each cell in Ck, 1 < k ≤ n, is contained within a cylinder C ×R, where C is a cell in Ck−1. We proceed by induction on k. Using the method of iterating the PROJ opera- tion and rebuilding cells, we can assume that there are continuous semialgebraic functions ξ1 < . . . < ξl : C → R, whose graphs cut the cylinder C × R into AC,i (the graph of ξC,i), and BC,i (the band between ξC,i and ξC,i+1). Each cell of Ck is either one of the AC,i or one of the BC,i. As in the proof of Theorem 3.2.5, n−k the ξC,i describe the roots of the family PROJ (P,Q1,...,Qr), (considered 0 as polynomials in Xk, and with parameters x ∈ C). By Propositions 3.2.2 and 0 3.2.3, the multiplicity of each ξi is fixed on C. Hence, if ξi(x ) has multiplic- 0 (mi−1) 0 ity mi as a root of, say P , then ξi(x ) is a simple root of P (x ,Xk), the th 0 (mi − 1) derivative of P (x ,Xk) with respect to Xk, on C. The functions n−k k ∞ in PROJ (P,Q1,...,Qr) are polynomials R → R, meaning they are C . ∞ Therefore, by the Implicit Function Theorem, the ξi : C → R are also C . ∞ k−1 Hence, if C is a C submanifold of R , then the graphs and bands, AC,i and ∞ BC,i, in C × R are also C . These AC,i and BC,i are the cells in Ck of a c.a.d. of Rn, and by Proposition 3.1.1, each cell of this c.a.d. is semialgebraically homeomorphic to an open hypercube (0, 1)d, for some 0 ≤ d ≤ n. The formu- las in Proposition 3.1.1 showing that the cells of a c.a.d. are semialgebraically homeomorphic to open hypercubes are also C∞, and are therefore semialgebraic diffeomorphisms for each C and AC,i, or C × (0, 1) and BC,i, respectively. That ∞ n−1 is, if C ∈ Ck−1 is a C submanifold of R semialgebraically diffeomorphic to

76 d ∞ an open hypercube (0, 1) , then the cells of Ck in C×R are also C submanifolds of Rk semialgebraically diffeomorphic to open hypercubes (0, 1)d or (0, 1)d+1, corresponding to graphs and bands, respectively. Finally, we consider the case of k = 2. By definition, each cell in C1 is either a point or an open interval. Both of these are clearly C∞ submanifolds, and are semialgebraically diffeomorphic to either (0, 1)0 or (0, 1)1, respectively. This completes the induction process.

Remark: A semialgebraic C∞ submanifold of Rn is called a Nash . Here, we saw that the cells of a c.a.d. adapted to a semialgebraic set are Nash . We give the following theorem on the number of connected components of semialgebraic sets. Theorem 3.2.8. Every semialgebraic set has finitely many connected compo- nents, each of which are semialgebraic.

Proof. Theorem 3.2.7 tells us that every semialgebraic set S ⊂ Rn is the disjoint n union of cells C1,...,Cs of a c.a.d. of R adapted to S, where each Ci is semialgebraically homeomorphic to an open hypercube (0, 1)di . This also tells us that each cell is connected. We say that two cells Ci and Cj are adjacent if clos(Ci) ∩ Cj 6= ∅ or Ci ∩ clos(Cj) 6= ∅, and we define an equivalence relation ‘∼adj’ by the rule: “Ci ∼adj Cj if there exists a sequence of cells of the c.a.d.,

Cl1 ,...,Clp , such that Ci = Cl1 , Cj = Clp , and all pairs of consecutive cells in the sequence Clk and Clk+1 are adjacent”. Indeed, this is clearly an equivalence relation, as any such sequence can be reversed, and any any sequence whose starting element is the ending element of another sequence can be concatenated to form a longer sequence of successively adjacent cells. We check that the equivalence classes [Ci1 ],..., [Cir ] of cells via ∼adj are precisely the connected components S1,...,Sr of S. If some Cj has nonempty intersection with the closure of Si, then Cj is adjacent to some cell contained in Si, and so Cj must also be contained in Si, meaning that all cells in Cj which are contained in the clos(Si) are contained in Si itself. Hence each Si is closed in S. Furthermore, there are finitely many Si, (as there are finitely many cells in the c.a.d. adapted to S), and so the complement of any Si in S is the union of finitely many closed subsets of S. Hence each Si is also open in S. Finally, we wish to show that each Si is indeed connected. Suppose by way of contradiction that there exists a separation Si = A ∪ B, where A, B 6= ∅ are closed in Si and A ∩ B = ∅. Since each cell is connected, any cell Cj ⊂ Si is a subset of exactly one of the sets A or B. Furthermore, any two adjacent cells Cj,Cj0 ⊂ Si must be contained in the same set, either A or B. So, if we assume Cj ⊂ A and Cj0 ⊂ B, then 0 there does not exist a sequence Cj =: Cj1 ,...,Cjk := Cj of cells in Si where each pair of successive elements are adjacent to one another. However, this is a

0 contradiction since Si = [Cj]∼adj = [Cj ]∼adj by definition. Corollary 3.2.9. Every semialgebraic set is locally connected.

77 Proof. Consider a semialgebraic set S ⊂ Rn, and an open ball B(x, r) centered at x ∈ S with radius r > 0. The intersection S ∩ B(x, r) is semialgebraic, and hence has finitely many connected components. Exactly one of these connected components contains x, and is a connected neighborhood of x in S. This holds for any x ∈ S, and for any radius, hence S is locally connected.

3.3 Algorithmic construction of an adapted c.a.d. We have shown how to construct an adapted c.a.d. in the previous subsection. We now consider the details of the algorithmic construction of an adapted c.a.d.. We use arguments similar to those present in the proofs of Theorems 3.2.5 and 3.2.7. Cylindrical algebraic decomposition algorithm 1. Given a list of polynomials in Q[X1], the real roots of all of these polynomials are counted and isolated in intervals (with rational endpoints). The cells in C1 of the c.a.d. are precisely these roots and the open intervals between them. A root itself is characterized by its isolating interval, along with the polynomial that it is a root of. The open intervals between roots may be assigned an endpoint of one of the isolating intervals as a test point. 2. Given a list of polynomials (P1,...,Pr) in Q[X1,...,Xn], n > 1, the fam- ily PROJ(P1,...,Pr) is computed, and the algorithm is called again for this new list of polynomials in Q[X1,...,Xn−1]. The algorithm returns a c.a.d. n−1 of R , where every cell in Cn−1 is PROJ(P1,...,Pr)-invariant, along with a 0 test point, say xC , for every C ∈ Cn−1. Theorem 3.2.5 can be applied to each C ∈ Cn−1, hence we know that C × R is cut into (P1,...,Pr)-invariant cells. Sturm’s method can be applied in order to count and isolate all of the roots of 0 0 P1(xC ,Xn),...,Pr(xC ,Xn).

The c.a.d. algorithm can be summarized as follows. The input is a finite list of polynomials P1,...,Pr in Q[X1,...,Xn], and the output is a list of the cells n of a c.a.d. of R adapted to (P1,...,Pr), along with a point whose coordinates are either rational or real algebraic. The cylindrical arrangement of cells is also shown. However, the relative arrangement of cells from different cylinders is not. We make improvements on this point in Subsection 3.4. For any semialgebraic set S ⊂ Rn, the c.a.d. algorithm produces at least one point in every connected component of S. Hence it can be used to an- swer the problem of whether or not a formula without free variables is true - the test points given for each cell in the c.a.d. adapted to the semialgebraic set which satisfies such a formula serve as a representative of its entire cell, as to whether or not points in that cell satisfy the formula. However, the complexity of the algorithm is extremely high, and so the cost of running the algorithm is very limiting, in general. The complexity can be partly explained by observing the PROJ operation, which requires the computation of resul- tants and principal subresultant coefficients. If we consider two polynomials

P,Q ∈ R[X1,...,Xn] with degXn (P ) = d, degXn (Q) = e, then PSRC0(P,Q) is 2 a polynomial in X1,...,Xn−1 with degree (d + e) , in general. Furthermore,

78 the PROJ operation requires the calculation of the principal subresultant coef- ficients of every order, and for every pair of polynomials (and their derivative with respect to Xn) in the family to which it is applied. The PROJ operation is iterated n − 1 times, so that the degree of the polynomials in one variable is n−1 doubly exponential in the number of free variables (i.e. up to (d + e)2 where d+e is maximal in the family). Improvements to the projection operation, such as reducing the number of polynomials it outputs (thereby reducing the size of the family to which successive iterations are applied), are obviously desirable for the sake of implementing the c.a.d. algorithm. This particular version of c.a.d. algorithm is due to Collins [3]. There have since been improvements to the complexity and efficiency, such as those by McCallum ([4] and [5]), and more recently Brown [6].

3.4 Improved cylindrical algebraic decomposition The c.a.d. construction we have described so far is unable to preserve infor- mation in some cases, as was illustrated in Examples 25 and 26. We saw that there isn’t always a clear way in which cells from adjacent cylinders ‘match up’.

For instance, the closure of a band BCi,j within the cylinder Ci × R is just the union of the band itself with the graphs bounding it. However, we are unable to say anything meaningful about the closure of a cell in relation to the cells in adjacent cylinders. As in the first part of Example 26, we saw that we are unable to recover the topology of the surface defined by P = 0 when taking the variables in a certain order, and the fact that we are able to determine the topology of the same surface by taking the variables in a different order is by our choice of the polynomial. That is to say that, with the c.a.d. construction detailed in this section so far, we cannot always determine the topology of a set upon its reconstruction as a union of sign-invariant cells. We will explore a way to overcome this. We also saw in previous examples that, in general, the cells of such a c.a.d. cannot be described by boolean combinations of sign conditions on the polynomials which are used and produced by the algorithm. This problem can be overcome using a result from Thom, which we will now investigate We begin with the case of a single variable. Consider a family of polynomials r (P1,...,Pr) ⊂ R[X], and an r-tuple,  := (1, . . . , r) ∈ {−1, 0, 1} , where each i ∈ {−1, 0, 1}. We define  := (1,..., r), where   {+1, 0} if i = +1 i := {−1, 0} if i = −1  {0} otherwise

We set

A(P1,...,Pr) := {x ∈ R | sign(Pi(x)) = i for all i = 1, . . . , r}, and we define

A(P1,...,Pr) := {x ∈ R | sign(Pi(x)) ∈ i for all i = 1, . . . , r}.

79 We may omit the arguments and simply write A and A if it causes no ambi- guity. With the above setup, Thom’s lemma is as follows. Lemma 3.4.1 (Thom’s lemma). Consider a finite family of nonzero polyno- mials (P1,...,Pr) ⊂ R[X] which is closed under derivation, and an r-tuple, r  ∈ {−1, 0, 1} . Then the set A is either empty, a point (at least one of the i = 0), or an open interval (all of the i 6= 0), and the set A is either empty, a point, or a closed interval different to a point. In the case that A is a closed interval different to a point, the interior of A is the open interval A. The proof of Thom’s lemma is by a fairly straightforward induction on r, the number of polynomials in the family. Instead, we prove a similar result which requires the (weaker) condition:

0 (?) For each i ∈ {1, . . . , r}, the roots of Pi of odd multiplicity are among

the roots of P1,...,Pi−1.

Lemma 3.4.2. Consider a finite family of nonzero polynomials (P1,...,Pr) ⊂ r R[X] satisfying the condition (?). Then for any  ∈ {−1, 0, +1} , A is either empty, a point (at least one of the i = 0), or an open interval (all of the i 6= 0), and A is either empty, a point, or a closed interval (different to a point). Furthermore, if A = [a, b], then the interior of this interval is A = (a, b). Proof. We prove this result by induction on r, the number of polynomials in 0 the family. For r = 1, the polynomial P1 must be such that its derivative, P1, has no roots of odd multiplicity. That is, P1 must be either non-decreasing, or non-increasing everywhere. This means that P1 can have at most one real root. Thus A(P1) is either empty, a point (with  = 0 and P1 has a real root), or an interval of the form (−∞, a) or (a, +∞)(P1 has a real root), or R (P1 constant, thus no roots). So the statement of the lemme holds for r = 1. We assume that the statement of the lemma holds for r −1, so that for every 0 i ∈ {1, . . . , r − 1} the roots of Pi of odd multiplicity are among the roots of P1,...,Pi−1. (In the case that i = 1, we must have that the polynomial P1 is 0 such that P1 has no roots of odd multiplicity.) We also use the following fact throughout the rest of the proof:

A,r (P1,...,Pr−1,Pr) = {x ∈ R | sign(Pi(x)) = i for all i = 1, . . . , r − 1 and sign(Pr(x)) = r} = {x ∈ R | sign(Pi(x)) = i for all i = 1, . . . , r − 1} ∩ {x ∈ R | sign(Pr(x)) = r}

= A(P1,...,Pr−1) ∩ Ar (Pr).

Case 1: If A(P1,...,Pr−1) is empty, then

A,r (P1,...,Pr) = A(P1,...,Pr−1) ∩ Ar (Pr) = ∅ ∩ Ar (Pr)

80 is clearly empty. The same holds for A,r (P1,...,Pr) when A is empty. Case 2: If A(P1,...,Pr−1) is a point, say, {c}, then

A,r (P1,...,Pr) = A(P1,...,Pr−1) ∩ Ar (Pr) = {c} ∩ Ar ,

which is clearly either {c} or empty. The same holds for A,r (P1,...,Pr) when A is a point. Case 3: If A is an open interval, say (a, b), then we use the assumption 0 that the roots of Pr of odd multiplicity are among the roots of P1,...,Pr−1. We know that none of the polynomials P1,...,Pr−1 can have roots in the interval (a, b), since, by definition, P1,...,Pr−1 are all nonzero polynomials, and have 0 constant sign on A = (a, b). Therefore, we know that Pr cannot have roots of odd multiplicity in the interval (a, b), and thus we know that Pr is either non-decreasing, or non-increasing, on (a, b). In fact, Pr is non-decreasing (or non-increasing) on [a, b], since Pr is polynomial and is therefore continuous. This means that Pr can have at most one root in [a, b]. There are three possibilities involving the roots of Pr in this interval. Case 3.1: The first possibility is that Pr has no roots in [a, b], in which case Pr has constant (nonzero) sign on [a, b]. Therefore A,r (P1,...,Pr) =

(a, b) ∩ Ar (Pr) is either equal to the entire interval (a, b) (that is, if r is equal to the sign of Pr on (a, b)), or empty (if sign(Pr) 6= r on (a, b)). Similarly,

A,r (P1,...,Pr) = [a, b] ∩ Ar (Pr) is either empty, or is equal to the entire interval [a, b]. If A,r = A = [a, b], then A,r = (a, b), since A = (a, b) by the inductive assumption and Pr has constant sign r ∈ r on Ar (Pr) ⊃ (a, b). That is, the interior of A is A. Case 3.2: The second possibility is that Pr has a root at either a or b. In this case, Pr still has constant sign σ ∈ {−1, +1} on (a, b], and sign(Pr(x)) ∈ σ for all x ∈ [a, b]. To see the latter assertion, assume (without loss of generality) that Pr has a root at a and let σr := sign(Pr(x)) for x ∈ (a, b], noting that Pr has constant sign on (a, b] so that σr is well-defined. Assume (without loss of generality) that σr = +1. Then σr = {0, +1}, with sign(Pr(a)) = 0 and sign(Pr) = +1 on (a, b]. Hence sign(Pr) ∈ σr on [a, b].

We now use the fact that A,s = (a, b) ∩ Ar and A,r = [a, b] ∩ Ar . If

r = +1 and σr = +1 then (a, b) ⊆ Ar (Pr), and {0, +1} = r ⊇ sign(Pr(x)) for x ∈ [a, b], hence [a, b] ⊆ Ar (Pr). Therefore, A,r (P1,...,Pr) = (a, b) ∩

Ar (Pr) = (a, b) and A,r (P1,...,Pr) = [a, b] ∩ Ar (Pr) = [a, b]. If r =

0 and σr = +1 then we have (a, b) ∩ Ar (Pr) = ∅, and r = 0 so that

[a, b] ∩ Ar (Pr) = {a}. Therefore, A,r (P1,...,Pr) = (a, b) ∩ Ar (Pr) = ∅ and

A,r (P1,...,Pr) = [a, b]∩Ar (Pr) = {a}. If r = −1 and σr = +1 then we have

(a, b)∩Ar (Pr) = ∅, and r ∩{sign(Pr(x)) | x ∈ [a, b]} = {−1, 0}∩{0, +1} = {0} so that [a, b] ∩ Ar (Pr) = {a}. Therefore A,r (P1,...,Pr) = (a, b) ∩ ∅ = ∅ and

A,r (P1,...,Pr) = [a, b] ∩ Ar (Pr) = {a}. The above arguments are repeated for the cases when σr = −1. Case 3.3: The third possibility is that Pr has a root c ∈ (a, b). In this case, Pr no longer has constant sign on (a, b). Instead, Pr has constant signs on [a, c), {c}, and (c, b], since it is continuous and has precisely one root in

81 the interval [a, b]. (In fact, Pr has no other roots in (a − δ, b + δ) for some δ > 0.) Denote by σa the sign of Pr on [a, c), and σb the sign of Pr(x) on (c, b]. Assume, without loss of generality, that Pr is non-decreasing on [a, b]. Then

σa = −1 and σb = +1. If r = −1 then we have (a, b) ∩ Ar (Pr) = (a, c) so that

A,r (P1,...,Pr) = (a, b)∩Ar (Pr) = (a, c). We also have [a, c]∩Ar (Pr) = [a, c] and (c, b] ∩ Ar (Pr) = ∅, so that A,r (P1,...,Pr) = [a, b] ∩ Ar = [a, c]. If

r = +1 then we have (a, b)∩Ar (Pr) = (c, b) so that A,r (P1,...,Pr) = (a, b)∩

Ar (Pr) = (c, b). We also have [a, c) ∩ Ar (Pr) = ∅ and [c, b] ∩ Ar (Pr) = [c, b], so that A,r (P1,...,Pr) = [a, b]∩Ar (Pr) = [c, b]. It is clear that if r = 0 then

A,r (P1,...,Pr) = (a, b) ∩ {c} = {c} and A,r (P1,...,Pr) = [a, b] ∩ {c} = {c}.

Thus in every case, A,r is either empty, a point, or an open interval, and A,r is either empty, a point, or a closed interval not equal to a point, and in the case that A,r is a closed interval, A,r is its interior.

The requirement that the family (P1,...,Pr) is closed under derivation is a particular case of condition (?). The role that these conditions play in the proof of Lemmas 3.4.1 and 3.4.2 is that, when we add a polynomial Pr to the family (P1,...,Pr−1) in the induction step, Pr is necessarily monotone on each set A(P1,...,Pr−1), meaning that there can be at most one root of Pr on each A, allowing us to deduce the results of each lemma. Specifically, the requirement that Pr has at most one root on each A is achieved by Pr being either non- increasing or non-decreasing on A. While including all derivatives of Pr in the family is a sufficient condition to achieve this, it is not a necessary condition. Indeed, condition (?) is a sufficient condition, and is more ‘streamlined’ in the sense that it invokes a smaller family of polynomials (in general) to achieve what is required in order to prove the Lemma 3.4.2. An obvious advantage of this modification is the potential reduction in computation cost. Namely, in adding all derivatives to the family, the computation cost of, say, computing the principal subresultant coefficients PSRCk(Pi,Pj) for all relevant k and all 1 ≤ j < i ≤ s increases drastically. However, with the alternative condition we may not need to add any extra polynomials. For example, if we take (P1,P2) = 5 4 3 2 0 (18X − 45X + 30X ,X − 4), then P1 has one root at X = 0, and P1 has no roots of odd multiplicity. (Notice that P1 is non-decreasing on all of R.) P2 0 has roots at X = ±2, and P2 has a simple root at X = 0, which is already a root of P1. Thus we have met the modified condition (?) of Lemma 3.4.2. In comparison, adding the derivatives of both P1 and P2 yields a family of eight nonzero polynomials (two of which are constant). On the other hand, one disadvantage of using condition (?) is the requirement of ordering. Particularly, this means that, checking whether or not an arbitrary finite family of polynomials (P1,...,Pr) ⊂ R[X] satisfies condition (?) can be computationally costly, especially for large values of r - one may need to check r! arrangements of (P1,...,Pr). In general, it is not enough to have Pi such that 0 the roots of Pi of odd multiplicity are among the roots of the other polynomials of the family, (P1,...,Pi−1,Pi+1,...,Pr). This is, we generally do require the ordering. We give a simple example to illustrate this. Example 27. Consider the family of polynomials (P,Q) = (X2−4,X3−12X).

82 0 Note that P has roots√ at√X = ±2, and P has a simple root at X = 0, while Q has roots at − 12, 0, 12, and Q0 has simple roots at X = ±2. That is, the root of P (which is of odd multiplicity) is a root of Q, and the roots of Q0 (which are both of odd multiplicity) are roots of P . However, there is no ordering, (P1,P2) of P and Q, such that the modified condition is satisfied, since the first derivatives of P and Q both have roots of odd multiplicity, and hence are invalid choices for P1.

20 P P Q 15

10

5 X −4 −3 −2 −1 1 2 3 4 −5

−10

−15

−20

Figure 21: Graph of P and Q showing the necessity of the (?) condition.

Consider the set A (P,Q) which is equal to {x ∈ | sign(P (x)) = +1} ∩ (+1,+1) √ √ R {x ∈ R | sign(Q(x)) = +1} = (− 12, −2) ∩ ( 12, ∞) which is the union of two disconnected open intervals (and similarly, A(+1,+1) is the union of two disconnected closed intervals). Hence the statement of the lemma does not hold in the absence of such an ordering of the family of polynomials. We now continue with the application of Thom’s lemma to cylindrical al- gebraic decomposition - we will refer to Lemma 3.4.1 and Lemma 3.4.2 inter- changeably, since the conditions of Lemma 3.4.1 are a special case of condition (?), the conditions of Thom’s lemma are more easily met. (A method for im- plementing condition (?) is explored in Section 5.) The following corollary is a first step towards improving our c.a.d. construction.

Corollary 3.4.3. Let P ∈ R[X] with degree d ≥ 1, and let a, b ∈ R be distinct, real roots of P . There exists a derivative P (i) of P such that P (i)(a)P (i)(b) < 0, for some 1 ≤ i ≤ d. Proof. Assume by way of contradiction that P (i)(a)P (i)(b) ≥ 0 for all 1 ≤ i ≤

83 d. Then, for each i, either sign(P (i)(a)) ⊆ sign(P (i)(b)) or sign(P (i)(b)) ⊆ sign(P (i)(a)), (where sign denotes the relaxing of the signs −1 and 1 to {−1, 0} and {1, 0} respectively, but 0 remains unchanged). That is, for the family of nonzero derivatives (P,P (1),...,P (d)) there exists some  ∈ {−1, 0, 1}d+1 such that [a, b] ⊆ A, and by Thom’s lemma 3.4.1, (a, b) ⊆ A. However, since a and b are distinct real roots of P , Rolle’s theorem asserts that there exists some point c ∈ (a, b) such that P 0(c) = 0, and hence the point {c} cannot (1) (d) be contained in the open interval A since all P,P ,...,P are necessarily nonzero on open intervals A. Therefore, the intervals (a, c) and (c, b) cannot be contained in (or comprised of) the same A, and so [a, c] and [c, b] are contained in (or comprised of) different A. This is a contradiction to the assumption that P (i)(a)P (i)(b) ≥ 0 for all 1 ≤ i ≤ d. Hence P (i)(a)P (i)(b) < 0 for at least one of the derivatives of P . Thom’s lemma indicates that, if an arbitrary finite family of nonzero poly- nomials (P1,...,Pr) is closed under derivation, then each cell of a c.a.d. of R adapted to (P1,...,Pr) will admit a description as a boolean combination of sign conditions on the polynomials P1,...,Pr. The cells of a c.a.d. of R adapted to (P1,...,Pr) produced in the way described in the previous subsection are precisely the real roots of the polynomials P1,...,Pr, and the open intervals between these roots (including the unbounded intervals), where each cell that is an interval is (P1,...,Pr)-invariant by definition. Conversely, Thom’s lemma r states that, for any r-tuple  ∈ {−1, 0, 1} , the set A is either empty, a point (at least one of the i = 0), or an open interval (all of the i 6= 0). That is, every cell in such a c.a.d of R is precisely one of the nonempty A, which, by definition, is a subset of R satisfying some boolean combination of sign conditions on the polynomials P1,...,Pr. Thus, we now have a solution in the single variable case to one of the problems we faced with our previous c.a.d. construction method - completing a family of polynomials in R[X] so that it is closed under derivation ensures that the cells of a c.a.d. of R adapted to this (completed) family can be described by a boolean combination of sign conditions on the polynomials of the family. This can be illuminated with a simple example. Take for instance P = X2 − 1, and construct an adapted c.a.d. of R. The cells of such a c.a.d. are (−∞, −1), {−1},(−1, 1), {1}, (1, ∞), and the set A1 - the subset of R on which P > 0 - is the union (−∞, −1) ∪ (1, ∞). Hence these two cells cannot be described as a sign condition on P . However, computing a c.a.d. adapted to (X2 −1, 2X, 2) - the set of all nonzero derivatives of P - we have cells (−∞, −1), {−1},(−1, 0), {0}, (0, 1), {1}, (1, ∞), noting that 2X is positive on (1, ∞) and negative on (−∞, −1). Hence we are able to describe the interval (−∞, −1) as 0 A1,−1 - the subset of R on which P > 0 and P < 0 - and (1, ∞) as A1,1 - the subset of R on which P > 0 and P 0 > 0. Of course, we wish to obtain such descriptions of the cells of a c.a.d. adapted to families of polynomials in arbitrarily many variables. Hence we aim to gener- alize Thom’s lemma to arbitrarily many variables, with the intent of applying it to the method of c.a.d. construction via successive projections. In particularly,

84 we want to keep Theorem 3.2.4 and Theorem 3.2.5 in mind. We first introduce some notation. Consider a family of polynomials (Pi,j) ⊂ R[X1,...,Xn] where 1 ≤ i ≤ n and 1 ≤ j ≤ ri, and let (i,j) be a family of signs, where each i,j ∈ {−1, 0, 1}. We define the set

Ck := {x ∈ k | sign(P (x)) =  for all i = 1, . . . , k, j = 1, . . . , r }. (i,j ) R i,j i,j i

Note that Ck is a semialgebraic subset of k by definition. (i,j ) R

Lemma 3.4.4. Consider a finite family of nonzero polynomials (Pi,j) where 1 ≤ i ≤ n, and 1 ≤ j ≤ ri. Assume that, for each i ∈ {1, . . . , n} fixed, the family

(Pi,1,...,Pi,ri ) ⊂ R[X1,...,Xi] is closed under derivation with respect to Xi, and that for each i ∈ {1, . . . , n − 1}, the family PROJ(Pi+1,1,...,Pi+1,ri+1 ) is contained in (Pi,1,...,Pi,ri ). For k ∈ {1, . . . , n}, denote by Ck the collection of all nonempty Ck . Then C ,..., C constitutes a c.a.d. if n adapted to (P ). i,j 1 n R i,j Proof. We proceed by induction on k. Assume that the statement of the lemma holds for some 1 ≤ k < n, so that, for each 1 ≤ i ≤ k we have fami- i lies Ci of semialgebraic subsets (cells) of R that are (Pi,j)-invariant. Since

(Pk,1,...,Pk,rk ) contains PROJ(Pk+1,1,,...,Pk+1,rk+1 ), the cells in Ck are there- fore PROJ(Pk+1,1,,...,Pk+1,rk+1 )-invariant. By Theorem 3.2.4, there exist con- tinuous semialgebraic functions ξ1 < . . . , ξl from each cell C ∈ Ck into R describ- ing the roots of (Pk+1,1,,...,Pk+1,rk+1 ) as polynomials in xk+1 with parameters

(x1, . . . , xk) ∈ C. Since the family (Pk+1,1,,...,Pk+1,rk+1 ) is assumed to be closed under derivation with respect to Xk+1, we can apply Thom’s lemma so rk+1 that, for all c ∈ C, and each  = (1, . . . , rk+1 ) ∈ {−1, 0, 1} , the set A(c) - the subset of {c} × R on which sign(Pk+1,j) = k+1,j for each j = 1, . . . , rk+1 - is either empty, a point, or an open interval. We know that the number of distinct ξ :→ R, and their ordering, is constant on each cell C ∈ Ck. Therefore the set A(c) is of the same type for every c ∈ C - that is, if A(c) is a point (or open interval) at some c ∈ C, then it is a point (or open interval) for all c ∈ C. (Alternatively, we can use the fact that the cells of an (adapted) c.a.d. are semialgebraically homeomorphic to open hypercubes to obtain this fact.) k Hence, if the cell C ∈ Ck is defined as the subset of R satisfying the conditions sign(Pi,j) = i,j for all 1 ≤ i ≤ k and 1 ≤ j ≤ ri, then the nonempty subsets of C × R defined by

0 C := {(c, xk+1) ∈ C × R | xk+1 ∈ A(c)} are precisely the graphs and bands between the ξ : C → R describing the roots 0 of (Pk+1,1,,...,Pk+1,rk+1 ). Notice that the set C defined above is actually

k+1 k+1 C = {x ∈ | sign(Pi,j) = i,j for all i = 1, . . . , k + 1, j = 1, . . . , ri}. (i,j ) R

Furthermore, each cell in C1,..., Ck is assumed to be the unique cell with a particular description in terms of sign conditions on the family of polynomials

85 k+1 (Pi,j) for 1 ≤ i ≤ k and 1 ≤ j ≤ ri. Therefore each (nonempty) C denotes (i,j ) k+1 a single cell in Ck+1 of a c.a.d. of R adapted to the family (Pi,j) for i = 1, . . . , k+1 and j = 1, . . . , ri. (Of course, we know that every cell in the cylinder C × R has such a description, since the graphs and bands are, by definition, the regions of the cylinder on which the family of polynomials has constant sign.) Finally, we prove the k = 1 case. Using Sturm’s method, we are able to count and isolate all distinct real roots of (P1,1,...,P1,r1 ), producing an adapted c.a.d. of R. By direct application of Thom’s lemma, the nonempty sets defined by some boolean combination of sign conditions on this family of polynomials are either points or open intervals. Therefore, each cell C of this adapted c.a.d. satisfying a particular description by sign conditions on (P1,1,...,P1,r1 ) is the only cell of this c.a.d. described by these conditions.

Lemma 3.4.4 ensures that every cell of a c.a.d. of Rn adapted to a finite family of polynomials has a description in terms of sign conditions on the family. We are simply required to complete the family so that it meets the conditions of the lemma. We return to the problem regarding the relative arrangement of cells from adjacent cylinders, and our inability to recover the topology of sets from our c.a.d. in general. We pointed out that the closure of a cell from one cylin- der doesn’t always correspond to cells from adjacent cylinders in an obvious way, meaning that we can only tell which cells are in adjacent cylinders to one another, but we cannot actually tell which cells are adjacent. Thom’s lemma provides another property concerning the nonempty A ⊂ R. Particularly that the closure A is simply A. We seek to generalize this in order to further improve our c.a.d. construction. In Example 26 we saw that, when taking the variables in a certain order, the polynomial P was not monic with respect to the variable to removed by the projection operation. The result of this was that the continuous semialgebraic functions ξ : C → R describing the roots of the polynomials over each cell did not extend continuously to adjacent cells, and that we could not recover the topology of the surface P = 0 from the c.a.d. adapted to P . We will examine what happens if we assume that the polynomials we are using are all monic with respect to the last variable, and later we will see how to ensure that we can achieve this for an arbitrary family of polynomials. In the following lemma we assume the following notation. For a finite family of polynomials P1,...,Pr in R[X1,...,Xn], and for a cell C ∈ Cn−1 of an n adapted c.a.d. of R , there are continuous semialgebraic function ξC,1 < . . . < ξC,l describing the real roots of the P1,...,Pr over C. The graphs of these ξC,i cut the cylinder C × R into (P1,...,Pr)-invariant regions of this cylinder, namely the graphs AC,i themselves, and the bands BC,i between the graphs of ξC,i and ξC,i+1.

Lemma 3.4.5. Let (P1,...,Pr) ⊂ R[X1,...,Xn] be a finite family that is closed under derivation with respect to Xn, such that the leading coefficients of P1,...,Pr (as polynomials in Xn) are all constant. Let C and D be connected,

86 n−1 semialgebraic, PROJ(P1,...,Pr)-invariant subsets of R such that D ⊂ C, and denote by ξC,1 < . . . < ξC,lC : C → R and ξD,1 < . . . < ξD,lD : D → R the continuous, semialgebraic functions describing the real roots of the P1,...,Pr over C and D, respectively. Then every ξC,i has an extension by continuity to D which coincides with some ξD,i0 , and every ξD,j0 is the extension of some ξC,j to D by continuity.

Proof. To show that every ξD,j0 is the extension of some ξC,j to D by continuity, we simply use the fact that the family (P1,...,Pr) is closed under derivation with respect to Xn. Therefore, each ξD,i0 is a simple root of at least one of the polynomials in the family, say Pk. That is, ∂Pk/∂Xn 6= 0 on D, and Pk(d, ξD,i0 (d)) = 0 for all d ∈ D. Thus, by the implicit function theorem there exists a unique, continuously differentiable function describing this simple root n−1 of Pk in an open neighborhood Ud ⊂ R of d, which necessarily intersects C since D ⊂ C. This function must be among the ξC,1, . . . , ξC,lC as it describes one of the roots of Pk ∈ (P1,...,Pr). To prove that every ξC,i has an extension by continuity to D which coincides with some ξD,i0 , we invoke Lemma 1.1.6 and Thom’s lemma 3.4.1. Each ξC,i is a simple root of some Pk ∈ (P1,...,Pr). We write Pk as a polynomial in Xn with n−1 e parameters in x ∈ R , so that Pk(x, Xn) = a0(x)Xn + ... + ae(x) for some e ∈ N, and the leading coefficient a0(x) is assumed to be a (nonzero) constant. We can apply Lemma 1.1.6 to Pk, so that any root α of Pk(x, Xn) must satisfy

1  a (x)  j |α| ≤ max e| j | . j=1,...,e a0

Since the coefficients aj(x) are polynomials in x, they are continuous functions of x. Therefore, if we take d ∈ D (so that the roots of Pk(d, Xn) are bounded 1 aj (d)  j by ± maxj=1,...,e e| | ), then for any  > 0, we can find a sufficiently small a0 n−1 neighborhood Ud ⊂ R of d such that, for any x ∈ Ud the roots of P (x, Xn)  1  aj (d)  j will be bounded by ± maxj=1,...,e e| | +  . Consider a sequence of a0 (m) points (c )m∈N in C converging to d. (Such a sequence necessarily exists for any d ∈ D since D ⊂ C by assumption.) Then we know that the sequence  1  (m) aj (d)  j ξC,i(c ) is bounded by ± maxi=j,...,e e| | +  . That is, as a limit a0 (m) point of the sequence of ξC,i(c ),

(m)  yd := lim sup ξC,i(c ) m∈N

 1  aj (d)  j exists and is bounded by ± maxj=1,...,e e| | +  . By definition, the a0 point (d, yd) is in the closure of the graph of ξC,i.

87 Figure 22: The ξC,i extend continuously to some ξD,i0 .

(1) (d) The family (P1,...,Pr) contains all nonzero derivatives Pk ,...,Pk of Pk, and by assumption, the signs of these are all constant on the graph AC,i, the graph (j) of ξC,i. That is, for all c ∈ C we have sign(Pk (c, ξC,i(c))) = j ∈ {−1, 0, 1} for j = 1, . . . , e, and of course sign(Pk(c, ξC,i(c))) = 0. We denote the (e + 1)- e+1 tuple  := (0, 1, . . . , e) ∈ {−1, 0, 1} . As the point (d, yd) is in the closure (j) of AC,i, the sign of each Pk (d, yd) must be in j for each j = 1, . . . , e, and sign(Pk(d, yd)) = 0. Now, Thom’s lemma asserts that the set of points satisfying these sign conditions is either empty or a point, (noting that sign(Pk(d, yd)) = 0 prevents the case of an interval). Particularly, there is at most one yd such that these sign conditions hold, meaning that ξC,i extends continuously to D, and that this extension coincides with some ξD,i0 , (one of the functions describing the roots of Pk over D). We introduce some more notation before giving another result with a similar setup to that above. For a family (P1,...,Pr) ⊂ R[X1,...,Xn] and some  = r (1, . . . , r) ∈ {−1, 0, 1} , we define

C := {(x, xn) ∈ C × R | signPi(x, xn) = i for i = 1, . . . , r} C := {(x, xn) ∈ C × R | sign(Pi(x, xn)) ∈ i for i = 1, . . . , r}.

Lemma 3.4.6. Let (P1,...,Pr) ⊂ R[X1,...,Xn] be a finite family that is closed under derivation with respect to Xn, such that the leading coefficients (treating the P1,...,Pr as polynomials in Xn) are all constant, and let C and D be n−1 connected, semialgebraic, PROJ(P1,...,Pr)-invariant subsets of R such that r D ⊂ C. Then for every  := (1, . . . , r) ∈ {−1, 0, 1} , the set C is either empty, one of the AC,i, or one of the BC,i. Furthermore, if C 6= ∅, then C ∩ (C × R) = C and C ∩ (D × R) = D, and the set D is either one of the graphs AD,j, or BD,j ∩(D ×R), the closure of some BD,j in the cylinder D ×R.

88 Proof. By construction, the sign of the P1,...,Pr are constant on each of the graphs and bands, AC,i and BC,i, over C, meaning that all points in such a set must satisfy some boolean combination of sign conditions on the family P1,...,Pr. That is, each AC,i (or BC,i) is contained in some C, where  = r (1, . . . , r) ∈ {−1, 0, 1} . For any point c ∈ C, Thom’s lemma asserts that the subset of {c} × R satisfying sign(Pj(c, Xn)) = j for j = 1, . . . , r is either empty, a point, or an open interval. Of course, the sets C are also (P1,...,Pr)- invariant subsets of C ×R by definition. Hence a nonempty C is precisely either one of the graphs AC,i, or bands BC,i describing the roots of the P1,...,Pr and the regions of constant sign between them, where, if any of the j = 0, then C is necessarily a graph, and if all of the j 6= 0, then C is necessarily a band. Furthermore, if C is a graph, AC,i, then the set C is also AC,i, since the sign of at least one of the P1,...,Pr must be 0 on this graph. If C is a band, BC,i, then the set C is the closure of BC,i in C × R. Indeed, for any point c ∈ C, if the subset of {c} × R satisfying sign(Pj(c, xn)) = j for all j = 1, . . . , r is an open interval, then the closure of this interval is the subset of {c} × R satisfying sign(Pj(c, xn)) ∈ j for all j = 1, . . . , r is the closure of this interval. Since this applies to all c ∈ C, and C is assumed to be a PROJ(P1,...,Pr)-invariant subset n−1 of R , we have C = AC,i ∪ BC,i ∪ AC,i+1, (where AC,0 and AC,lC are empty by definition). That is, if C = BC,i, then C = BC,i ∩ (C × R) = C ∩ (C × R). Similarly, the set D is either empty, one of the AD,i0 , or BD,i0 ∩ (D × R), the closure of one of the bands in D × R. We now consider the set C ∩ (D × R). For any point (x, xn) ∈ C, the sign of each Pj must satisfy sign(Pj(x, xn)) ∈ j. Since D ⊂ C, we have

C ∩ (D × R) ⊂ D. To see this more clearly one can simply recall the definition

D = {(x, xn) ∈ D × R | sign(Pj(x, xn)) ∈ j for all j = 1, . . . , r}.

By Lemma 3.4.5, each ξC,i : C → R describing the real roots of P1,...,Pr over C extends continuously to some ξD,i0 over D ⊂ C, and hence AC,i ∩ (D × R) is one of the AD,i0 cutting the cylinder D ×R. Furthermore, since BC,i is bounded by consecutive graphs AC,i and Ac,i+1, we know that the set BC,i ∩(D×R) must be bounded by graphs AD,i0 and AD,(i+1)0 , where AD,(i+1)0 is either AD,i0+1 or AD,i0 itself. That is, BC,i ∩ (D × R) is either a graph AD,i0 or the closure of some band BD,i0 in the cylinder D × R. We have shown that, for nonempty C, the set C ∩ (D × R) is either a graph or the closure of a band in D × R. By Thom’s lemma, the same is true for D. In the case that D = AD,i0 it is clear from the above arguments that C ∩ (D × R) is also equal to AD,i0 . We check that these sets are equal in the case that D is the closure of a band BD,i0 in D × R. That is, D = AD,i0 ∪ BD,i0 ∪ AD,i0+1. Thom’s lemma asserts that all sign(Pj(d, y)) = j 6= 0 for all point (d, y) ∈ BD,i0 . In fact, sign(Pj(d, y)) = j 6= 0 on every sufficiently small neighborhood U of (d, y) ∈ BD,i, since the Pj are continuous. Since D ⊂ C, all such neighborhoods U must intersect C, implying that sign(Pj) = j 6= 0 for all j = 1, . . . , r on

89 0 some nonempty subset U ⊂ C. Therefore (d, y) ∈ C, and hence BD,i0 ⊂ C. Furthermore, the AD,i0 and AD,i0+1 bounding BD,i0 are also contained in C since any neighborhood about any point AD,i0 (or AD,i0+1) must have nonempty intersection with AC,i(or AD,i+1), by Lemma 3.4.5. We have shown the reverse inclusion D ⊂ C ∩ (D × R) for each case, thus concluding the proof. The results of Lemma 3.4.5 and 3.4.6 allow us to relate the position of cells of a c.a.d. from adjacent cylinders. Assuming that we are able to meet the conditions of both Lemmas, the improvements to our c.a.d. construction are sufficient to allow us to recover the topology of an arbitrary semialgebraic subset upon reconstruction as a union of cells. We obtain the following result as a consequence of the results so far in this subsection, before showing how we can ensure that the required conditions may be met. We use the setup used in Lemma 3.4.4. For a family of polynomials (Pi,j) ⊂ R[X1,...,Xn] where 1 ≤ i ≤ n and 1 ≤ j ≤ ri, and a family (i,j) of signs i,j ∈ {−1, 0, 1}, we defined the set Ck = {x ∈ k | sign(P (x)) =  for all i = 1, . . . , k, j = 1, . . . , r } (i,j ) R i,j i,j i for any 1 ≤ k ≤ n. Here, we also define Ck := {x ∈ k | sign(P (x)) ∈  for all i = 1, . . . , k, j = 1, . . . r }. (i,j ) R i,j i,j i

Theorem 3.4.7. Consider a finite family of nonzero polynomials (Pi,j) where 1 ≤ i ≤ n and 1 ≤ j ≤ ri. Assume that, for each i ∈ {1, . . . , n} fixed, the family

(Pi,1,...,Pi,ri ) ⊂ R[X1,...Xn] is closed under derivation with respect to Xi, and that the leading coefficients (treating Pi,1,...,Pi,ri as polynomials in Xi) are all constant, and assume that, for each i ≤ n−1, PROJ(Pi+1,1,...,Pi+1,ri+1 ) is contained in (P ,...,P ). Then the nonempty sets Ck are the cells of i,1 i,ri (i,j ) a c.a.d. of n, and the closure of a nonempty Ck is Ck , which is a union R (i,j ) (i,j ) of cells. Proof. By Lemma 3.4.4, the nonempty Ck constitute a c.a.d. of n adapted (i,j ) R n−1 to (Pi, j), where the (Pn,1,...,Pn,r )-invariant subsets of the cylinders C × n (i,j ) R are precisely the nonempty Cn . (i,j ) We proceed with the proof of the closure property by induction on n. In the case of n = 1 the statement of the theorem is simply Thom’s lemma. Now, assume that n > 1 and that the statement holds for n − 1. Then the nonempty k−1 sets C for k = 1, . . . , n − 1 are PROJ(Pk,1,...,Pk,r )-invariant subsets of (i,j ) k k−1. Since a polynomial P is constant with respect to X , the set Cn is R n−1,j n (i,j ) necessarily a subset of the cylinder Cn−1 × , and, by the inductive assumption, (i,j ) R n−1 n−1 n−1 the closure of C is C . If C = D1 ∪ ... ∪ Dq, where the D1,...,Dq (i,j ) (i,j ) (i,j ) n−1 are cells of a c.a.d. of R adapted to PROJ(Pn,1,...,Pn,rn ), then the closure of Cn is the union of intersections (i,j )

n [ n  C = C ∩ (Dλ × ) . (i,j ) (i,j ) R λ∈{1,...,q}

90 For a cell D ⊂ Cn−1 , let us define the set (i,j )

D(i,j ) := {x ∈ D × R | sign(Pi,j(x)) ∈ i,j for i = 1, . . . , n, j = 1, . . . , ri}.

By Lemma 3.4.6, D(i,j ) = C(i,j ) ∩ (D × R). Since this holds for all such D, we have [ Cn = D (i,j ) (i,j ) λ∈{1,...,q}  n = x ∈ R | sign(Pi,j(x)) ∈ i,j for i = 1, . . . , n − 1, j = 1, . . . , ri,

and sign(Pn,j(x)) ∈ n,j for j = 1, . . . , rn

= C(i,j ).

The fact that C(i,j ) is a union of cells follows immediately from Lemma 3.4.6, as each set D(i,j ) is either a graph or the closure of a band in D × R. The above theorem is a generalization of Thom’s lemma, in that the closure properties shown to occur in Theorem 3.4.7, (particularly when observing the construction of cells in new cylinders), are the same as in the 1-dimensional case. Now that we have an answer to the problems pertaining to our previous c.a.d. construction, we revisit Example 26 to implement the improvements made by Theorem 3.4.7. Example 28. We consider the family consisting of a single polynomial P = XYZ − X3 − Y 2, which we used in Example 26, and take the variables in the order Z,X,Y to avoid vanishing coefficients, for now. We wish to complete this family to include all nonzero derivatives with respect to Y , before applying the PROJ operation. Let us denote P and its derivatives by

3 2 P3,1 = XYZ − X − Y ,P3,2 = −2Y + XZ,P3,3 = −2, and compute PROJ(P3,1,P3,2,P3,3). Beginning with principal subresultant co- efficients:

3 2 2 PSRC0(P3,1,P3,2) = 4X − X Z , which we will denote by P2,1. Noting that P3,2 = ∂P3,1/∂Y , and P3,3 is constant, we have PROJ(P3,1,P3,2,P3,3) = (P2,1). We complete this family to include all nonzero derivatives of P2,1 with respect to X, giving

3 2 2 2 2 2 P2,1 = 4X − X Z ,P2,2 = 12X − 2XZ ,P2,3 = 24X − 2Z ,P2,4 = 24.

We compute PROJ(P2,1,P2,2,P2,3,P2,4), all of which have constant leading co-

91 efficients.  4 −Z2 0 0 0  0 4 −Z2 0 0  2  PSRC0(P2,1,P2,2) = det  0 0 12 −2Z 0    0 12 −2Z2 0 0 12 −2Z2 0 0 0 = 0,  4 −Z2 0  2 PSRC1(P2,1,P2,2) = det  0 12 −2Z  12 −2Z2 0 = 8Z4, 12 −2Z2 0  2 PSRC0(P2,2,P2,3) = det  0 24 −2Z  24 −2Z2 0 = 48Z4,  4 −Z2 0 0   0 0 24 −2Z2 PSRC0(P2,1,P2,3) = det    0 24 −2Z2 0  24 −2Z2 0 0 = −64Z6,

4 4 6 so that PROJ(P2,1,P2,2,P2,3,P2,4) = (P1,1,P1,2,P1,3) = (4Z , 48Z , −64Z ). Completing this family to include all derivatives yields only pure powers of Z, and hence the only root of this family is at 0, partitioning R into C1 = {C1,1,C1,2,C1,3} = {(−∞, 0), {0}, (0, ∞)}. Each of the cells C1,1,C1,2,C1,3 are 2 PROJ(P2,1,P2,2,P2,3,P2,4)-invariant subsets of R. We reconstruct R as a union of (P2,1,P2,2,P2,3,P2,4)-invariant cells by finding the roots of P2,1,P2,2,P2,3,P2,4 3 2 2 over each cell in C1. The polynomial P2,1 = 4X − z X has distinct roots at 2 X = 0 and X = z /4 over the cells C1,1 and C1,3, and a single root at X = 0 2 2 2 over C1,2. The polynomial P2,2 = 12X − 2z X has distinct roots at 0 and z /6 2 over C1,1 and C1,3, with a single root at 0 over C1,2, and P2,3 = 24X − 2z 2 has a root at z /12 over C1,1 and C1,3,and a root at 0 over C1,2. These roots 2 partition R into 21 cells - 9 in both C1,1 × R and C1,3 × R, and 3 in C1,2 × R. They are illustrated in Figure 23 below.

92 2 Figure 23: Partition of R into (P2,1,P2,2,P2,3,P2,4)-invariant cells.

3 Finally, we reconstruct R as a union of (P3,1,P3,2,P3,3)-invariant cells by finding the roots of P3,1,P3,2,P3,3 over each cells in√ C2. The polynomial P3,1 = xzY − x3 − Y 2 has 2 distinct real roots at (xz ± x2z2 − 4x3)/2 over all cells in C2 except for C1,2,C1,8,C1,9,C2,2,C2,3,C3,2,C3,8,C3,9, where there is a single real root over C1,2,C1,8,C2,2,C3,2,C3,8, and no real roots over C1,9,C2,3,C3,9. The polynomial P3,2 = xz − 2Y is linear in Y , and therefore has a single real root at xz/2 over all cells in C2. The polynomial P3,3 is constant. As the nonzero derivatives have been included in the family, there is at least 1 function ξ : C → R describing the real roots of P3,1 and P3,2 over every cell in C2. This is in contrast to Example 26 in which the c.a.d. is adapted to P3,1 by itself, which has no real roots over C1,9, C2,3, and C3,9. The cells C1,6 and C3,6 arise as the derivative of P2,1 with respect to X vanishes here - these cells did not occur 3 2 in Example 26. Treating P3,1 = XYZ − X − Y as a polynomial in Y and 2 2 3 solving P3,1 = 0, we find that the discriminant X Z − 4X (as a polynomial 2 in X) has a turning point when X = Z /6. Indeed, P2,1 = PSRC0(P3,1,P3,2) is the resultant of P3,1. The cells C1,9,C1,7,C2,3,C3,7,C3,9 do not provide extra information on higher-order cell arrangement. Included below is a graph of slices of the surfaces P3,1 = 0 and P3,2 = 0 over each of the cells C1,C2,C3 in C1, demonstrating the improvements to cell arrangement in the presence of derivatives. We pay particular attention to implications on the relative arrangement of cell over C1,9, C2,3, and C3,9.

93 Figure 24: Slices of P3,1 = 0 (red) and P3,2 = 0 (blue) in the x-y plane for z > 0, z = 0, and z < 0. The dashed vertical lines indicate x = 0, x = z2/6, and x = z2/4 (blue, black, and red, from left to right)

The cells of C3 are now arranged so that the intersection of the closure of any cell with neighboring cylinders is either a graph or the closure of a band in the respective cylinder. Indeed, the closure of the cell

3 2 C3,9,1 := {(z, x, y) ∈ R | z > 0 and x > z /4 and y < xz/2} intersects the cylinder C3,8 × R nonemptily, and this intersection is the union of cells

3 2 C3,8,1 := {(z, x, y) ∈ R | z > 0 and x > z /4 and y < xz/2} 3 2 C3,8,2 := {(z, x, y) ∈ R | z > 0 and x > z /4 and y = xz/2}, a band and a graph, respectively, as stated by Theorem 3.4.7. In the case of Example 26, the only cell over C3,9 is the entire cylinder C3,9 × R. The intersection of the closure of this cell with the cylinder C3,8 × R is the entire cylinder, however C3,8 ×R is a union of 3 cells, C3,8,1, C3,8,2, and the band C3,8,3 defined by y > xz/2. That is, 2 bands and a graph. Furthermore, the graph C3,8,2 had no corresponding graph in C3,9 × R, whereas the function describing the root of P3,2 over C3,9 extends continuously to C3,8 × R, as stated by Lemma 3.4.5. We are also able to describe each cell by a boolean combination of sign conditions on the polynomials used. For instance, the cells

p 2 2 3 C3,3,1 := {(z, x, y) ∈ C3,3 × R | y < (xz − x z − 4x )/2} p 2 2 3 C3,3,7 := {(z, x, y) ∈ C3,3 × R | y > (xz + x z − 4x )/2} can be described by the conditions

00 “z > 0,P2,2 < 0,P2,3 < 0,P3,1 < 0,P3,2 > 0 00 and “z > 0,P2,2 < 0,P2,3 < 0,P3,1 < 0,P3,2 < 0 respectively, where z is (up to a nonzero constant multiple) one of the derivatives of P1,1. However, using the c.a.d. from Example 26, we are unable to distinguish

94 all cells from one another in the same way. For instance, the cells ‘above’ and ‘below’ the surface P3,1 = 0 satisfy the same condition of P3,1 < 0, and there are no other polynomials in the family that are not constant with respect to Y .

We now show that an arbitrary finite family of polynomials can be made to satisfy the conditions of Theorem 3.4.7, and in turn, that every semialgebraic set can be represented as a disjoint union of cells, each semialgebraically diffeo- morphic to an open hypercube, such that the closure of a cell is a union with cells of a strictly smaller dimension. We use a linear automorphism of Rn to ensure that leading coefficients are constant, and complete the family to include nonzero derivatives.

Proposition 3.4.8. For an arbitrary finite family of polynomials P1,...,Pr n in R[X1,...,Xn], there is a (linear) automorphism µ of R and a finite fam- ily of polynomials (Pi,j) satisfying the conditions of Theorem 3.4.7 such that Pn,j(X) = Pj(µ(X)) for all j = 1, . . . , r. Proof. We claim that there exists a change of variables

ν(X1,...,Xn) := (X1 + c1Xn,...,Xn−1 + cn−1Xn,Xn) such that the P1(ν(X)),...,Pr(ν(X)), (as polynomials in Xn), all have constant n leading coefficients. Denote X = (X1,...,Xn) ∈ R . Then for any Pi ∈ (P1,...,Pr) write Pi(X) = Hi(X) + Li(X), where Hi is the highest degree d1 dn−1 dn homogeneous part of Pi. A term of the form CX1 ...Xn−1 Xn , where d1 + ... + dn = d and C is some real constant, is transformed by ν to

d1 dn−1 dn d1 dn−1 dn ν(CX1 ...Xn−1 Xn ) = C(X1 + c1Xn) ... (Xn−1 + cn−1Xn) Xn , whose term of the highest power of Xn is

0 d d1 dn−1 d1+...dn−1+dn C Xn := C(c1 . . . cn−1 )Xn ,

0 0 where C is constant with respect to X1,...,Xn. However, we can treat C as a polynomial in c1, . . . , cn−1. This shows that, if H is homogeneous with degree d, then Hi(ν(X)) is a sum of monomials of degree ≤ d, where the degree d 0 monomials are pure powers of Xn, with coefficients C . Therefore, the leading 0 coefficient of Hi(ν(X)) as a polynomial in Xn is a sum of these C , say Ki. If this Ki is nonzero, then Hi(ν(X)) is a polynomial of degree d with respect to Xn, and has a constant leading coefficient with respect to X1,...,Xn−1. We require that the leading coefficients Ki of Hi(ν(X)) corresponding to each Pi ∈ (P1,...,Pr) to be nonzero. Finding an automorphism such that the Ki n−1 are nonzero equates to finding a point (c1, . . . , cn−1) in R which is not a root of any of the Ki. Since each Ki is a finite degree nonzero polynomial in c1, . . . , cn−1, the sets on which Ki = 0 are of smaller dimension than n, and so we can always find such a point. Therefore, we can always find such a lin- n ear automorphism ν of R such that P1(ν(X)),...,Pr(ν(X)) all have constant

95 leading coefficients with respect to Xn. Finally, completing the family by includ- ing all nonzero derivatives of the P1(ν(X)),...,Pr(ν(X)), we obtain a family

P1(ν(X)),...,Pr(ν(X)),Pr+1(X),...,Prn (X) of polynomials in R[X1,...,Xn] which is closed under derivation with respect to Xn, and whose leading coeffi- cients are all constant with respect to X1,...,Xn−1. We define this to be family

Pn,1(X),...,Pn,rn (X), (listed in the same order). 0 Continuing, we compute PROJ(Pn,1(X),...,Pn,rn (X)) =: (Q1,...,Qr ), 0 which are polynomials in R[X1,...,Xn−1]. Denote X := (X1,...,Xn−1). We proceed by induction, assuming that there is a linear automorphism µ0 n−1 of R and a finite family (Qi,j), i = 1, . . . , n − 1, j = 1, . . . , ri, of polyno- mials in R[X1,...,Xn−1] satisfying the conditions of Theorem 3.4.7, such that 0 0 0 0 0 n n Qn−1,j(X ) = Qj(µ (X )) for j = 1, . . . , r . Let us denote (µ × Id): R → R 0 0 0 0 the product mapping such that (µ ×Id)(X ,Xn) = (µ (X ),Xn), acting trivially n on Xn. Then we can define our automorphism µ of R by

n n µ :R −→ R X 7−→ (µ0 × Id) ◦ ν(X).

0 0 Setting Pn,j(X) = Pj(µ(X)) for 1 ≤ j ≤ r and Pn,j(X) = Pj(µ (X ),Xn) for r < j ≤ rn, we have a family (Pi,j), 1 ≤ i ≤ n, 1 ≤ j ≤ ri, of polynomials satisfying the conditions of Theorem 3.4.7, and the statement of the proposition. To complete the induction, we check the single variable case. Every poly- nomial in R[X1] has constant leading coefficient. Therefore, given a family of polynomials (P1,...,Pr) in R[X1], we need only add all nonzero derivatives to the family to obtain the family (P1,j), 1 ≤ j ≤ r1 as required. The following example is an explicit application of Proposition 3.4.8. Example 29. Take P (X,Y,Z) = (X2 + Y 2)Z with the variables in the order X,Y,Z. Clearly lc(P ) = X2 + Y 2 is not constant. We make a linear change of variables ν(X,Y,Z) = (X + Z,Y + Z,Z), as in the above proposition. Then we have

P (ν(X,Y,Z)) = ((X + Z)2 + (Y + Z)2)Z = 2Z3 + 2(X + Y )Z2 + (X2 + Y 2)Z whose leading coefficient is 2. We denote P1(X,Y,Z) := P (ν(X,Y,Z)). Com- pleting this to a family that includes all nonzero derivatives with respect to Z, we obtain

3 2 2 2 P1(X,Y,Z) = 2Z + 2(X + Y )Z + (X + Y )Z, 2 2 2 P2(X,Y,Z) = 6Z + 4(X + Y )Z + X + Y ,

P3(X,Y,Z) = 12Z + 4(X + Y ),

P4(X,Y,Z) = 12.

96 We then compute PROJ(P1,P2,P3,P4). Since the leading coefficients are all constant with respect to X and Y , we need only compute the principal subre- sultant coefficients.

6 5 2 4 3 3 PSRC0(P1,P2) = −8Y + 16XY − 24X Y + 32X Y − 24X4Y 2 + 16X5Y − 8X6, 2 2 PSRC1(P1,P2) = −8Y + 32XY − X , 3 2 2 3 PSRC0(P1,P3) = −320Y + 192XY + 192X Y − 320X . 6 Therefore PROJ(P1,P2,P3,P4) = (Q1,Q2,Q3), where we denote Q1 = −8Y + 5 2 4 3 3 4 2 5 6 3 16XY − 24X Y + 32X Y − 24X Y + 16X Y − 8X , Q2 = −320Y + 2 2 3 2 2 192XY + 192X Y − 320X , and Q3 = −8Y + 32XY − X , and we complete this list to include all nonzero derivatives. We obtain

6 5 2 4 3 3 4 2 5 6 Q1(X,Y ) = −8Y + 16XY − 24X Y + 32X Y − 24X Y + 16X Y − 8X , 3 2 2 3 Q2(X,Y ) = −320Y + 192XY + 192X Y − 320X , 2 2 Q3(X,Y ) = −8Y + 32XY − X , 5 4 2 3 3 2 4 5 Q4(X,Y ) = −48Y + 80XY − 96X Y + 96X Y − 48X Y + 16X , 4 3 2 2 3 4 Q5(X,Y ) = −240Y + 320XY − 288X Y + 192X Y − 48X , 3 2 2 3 Q6(X,Y ) = −960Y + 960XY − 576X Y + 192X , 2 2 Q7(X,Y ) = −2880Y + 1920XY − 576X ,

Q8(X,Y ) = −5760Y + 1920X,

Q9(X,Y ) = −5760, 2 2 Q10(X,Y ) = −960Y + 384XY + 192X ,

Q11(X,Y ) = −1920Y + 384X,

Q12(X,Y ) = −1920,

Q13(X,Y ) = −16Y + 32X,

Q14(X,Y ) = −16.

We have a family of polynomials (Q1,...,Q14) containing PROJ(P1,P2,P3,P4), and which is closed under derivation with respect to Y . Treating them as poly- nomials in Y with parameter X, all leading coefficients are constant, and so a change of coordinates is not needed. Now, PROJ(Q1,...,Q14) is a family of polynomials in R[X], whose leading coefficients are therefore constant. Complet- ing PROJ(Q1,...,Q14) to include all derivatives yields a family of polynomials, say R1,...,Rr1 . Finally, using the notation of Proposition 3.4.8, we set

P3,1 = P1,P3,2 := P2,P3,3 := P3,P3,4 := P4,

P2,j := Qj for all j = 1,..., 14,

P1,j := Rj for all j = 1, . . . , r1.

(We did not explicitly compute the family R1,...,Rr1 as it is expected to be a large family, and does not contribute to the purpose of illustrating the ap-

97 plication of Proposition 3.4.8.) Hence we have a family of polynomials (Pi,j) in R[X,Y,Z] satisfying the conditions of Theorem 3.4.7. Finally, we define our linear automorphism µ := ν, noting that only a single change of variables was needed, so that P3,j(X,Y,Z) = Pj(ν(X,Y,Z)) for all j = 1,..., 4, as required by Proposition 3.4.8.

Finally, we have shown that any semialgebraic subset S of Rn can be de- composed as a finite union of cells Ci, each of which is semialgebraically dif- feomorphic to an open hypercube (0, 1)di , and the closure of a cell is a union of other cells of smaller dimension. Such a decomposition of S as a union of cells is called a stratification, where the strata are the Ci of dimension di. Our constructions also allow for the cylindrical algebraic decomposition to be refined in such a way that any semialgebraic subset of S can also be represented as a union of cells. We state this more precisely and give a proof as a consequence of previous results.

n Theorem 3.4.9. For any semialgebraic set S ⊂ R , with S1,...,Sq finitely many semialgebraic subsets of S, there is a decomposition of S into finitely many p disjoint cells, S = ∪i=1Ci, such that each subset Sj is a union of some cells Ci, di each Ci is semialgebraically diffeomorphic to an open hypercube, (0, 1) , and the closure of Ci in S is a union of Ci with some Cj where dj := dim(Cj) < di for each j 6= i.

Proof. We consider the list of polynomials used to define S and S1,...,Sq by boolean combinations of sign conditions, say P1,...,Pr ∈ R[X1,...,Xn]. We apply Proposition 3.4.8 to obtain a family of polynomials in R[X1,...,Xn] satis- fying the conditions of Theorem 3.4.7. Therefore the sets S, S1,...,Sq are com- prised of the (disjoint) sets Ck of Theorem 3.4.7. The closure of a nonempty (i,j ) n n n C is C , which is a union of cells C 0 . (i,j ) (i,j ) (i,j ) n We prove that the dimension d0 of C 0 is smaller than the dimension d (i,j ) of Cn by induction on n. For n = 1 we apply Thom’s lemma. If C1 is a (i,j ) (i,j ) point then C1 = C1 = C1 . If C1 is an interval, say (a, b) ⊂ R, then (i,j ) (i,j ) (i,j ) (i,j ) C1 = C1 = [a, b] = (a, b) ∪ {a} ∪ {b}. Hence the statement is true for (i,j ) (i,j ) n = 1. Now, let n > 1 and assume that the statement is true for n − 1. That is, we assume that a cell D in the closure of C has dimension dim(D) < dim(C). We consider a nonempty cell Cn ⊂ C × . By Lemma 3.4.6, Cn is either (i,j ) R (i,j ) n n a graph AC,k or a band BC,k in C × , with C ∩ (C × ) = C and R (i,j ) R (i,j ) n n C ∩(D× ) = D . We check both cases. The dimension of AC,k ∩(C × ) (i,j ) R (i,j ) R is equal to dim(C), and the dimension of AC,k ∩(D×R) is equal to dim D. By the inductive assumption, dim(D) < dim(C), and so the statement of the theorem holds in the case of a graph. The dimension of BC,k ∩ (C × R) = AC,k ∪ BC,k ∪ AC,k+1 is equal to dim(C) + 1, as BC,k is semialgebraically diffeomorphic to C × (0, 1). The set BC,k ∩ (D × R) is either a graph or the closure of a band in D × R, whose dimension is either dim(D) or dim(D) + 1, respectively. By the

98 inductive assumption we have dim(D) + 1 < dim(C) + 1, hence the induction is complete. This concludes the proof.

To clarify the proof of Theorem 3.4.9, the cells in Ci (other than Ci itself) can either be graphs in the same cylinder, or graph/bands in cylinders of smaller dimension. Proposition 3.4.10. Consider some nonempty, bounded set Ck ⊂ n as- (i,j ) R sociated to a family of polynomials (Pi,j) satisfying the conditions of Theorem 3.4.7. The semialgebraic diffeomorphism (0, 1)d → Ck induced by the c.a.d., (i,j ) (as in Proposition 3.1.1), extends to a surjective mapping [0, 1]d → Ck . (i,j ) Proof. We prove this by induction on d. A nonempty bounded cell C of a c.a.d. of R is either a point {a}, or an open interval (a, b), and C is either {a} or [a, b], respectively. In the case that C is a point, the statement of the proposition is trivial, so we assume that C = (a, b). Let δ : (0, 1) → (a, b) be the diffeomorphic mapping of the 1-dimensional hypercube, (0, 1), to the cell C = (a, b), assuming that δ(t1) < δ(t2) for all t1, t2 ∈ (0, 1) with t1 < t2. The extension of δ to the surjective mapping [0, 1] → [a, b] is obvious - we define δ(0) = a and δ(1) = b. We now assume that the proposition holds for all hypercubes of dimension less n than d. Let C ⊂ C×R be a nonempty, bounded cell of a c.a.d. of R adapted to (Pi,j), where C is also a cell of this c.a.d. such that there exists a diffeomorphism u0 : (0, 1)d−1 → C. By the inductive assumption, u0 extends to a continuous d−1 surjective mapping [0, 1] → C. The set C is either a graph AC,k or a band BC,k of the cylinder C ×R. Let D be another cell of the c.a.d. such that D ⊂ C. If C = AC,k, then C had dimension dim(C) = dim(C) = d − 1, and we are d−1 0 done. (The diffeomorphism (0, 1) → C is simply ξC,k ◦ u , which is known to extend continuously to D, by Lemma 3.4.5.) If C = BC,k then

C ∩ (C × R) = AC,k ∪ BC,k ∪ AC,k+1,

C ∩ (D × R) = AD,k0 ∪ BD,k0 ∪ AD,(k+1)0 , where AD,(k+1)0 is either AD,k0 (in which case BD,k0 is not present) or AD,k0+1 (in which case BD,k0 is bounded between AD,k0 and AD,k0+1). For each point c ∈ C we take the linear mapping δd,k : (0, 1) → (ξC,k(c), ξC,k+1(c)), which extends to a linear mapping [0, 1] → [ξC,k(c), ξC,k+1(c)], with δd,k(0) = ξC,k(c) and δd,k(1) = ξC,k+1(c), and similarly for each point d ∈ D we take the lin- ear mapping δd,k0 : (0, 1) → (ξD,k0 (d), ξD,(k+1)0 (d)) with extension [0, 1] → [ξD,k0 (d), ξD,(k+1)0 (d)] such that δd,k0 (0) = ξD,k0 (d) and δd,k0 (1) = ξD,(k+1)0 (d). The extensions are clearly surjective. Note that this remains consistent inde- pendently of the nature of the set C ∩ (D × R) - that is, whether the AD,k0 and AD,(k+1)0 are distinct or not. Note also that C ∩ (D × R) is equal to some band BD,k0 only if C ∩ (C × R) is a band. We take our diffeomorphism to be

d u : (0, 1) −→ C 0 x 7−→ (u × δd,k)(x),

99 0 0 where (u × δd,k)(t1, . . . , td) = (u (t1, . . . , tn−1), δd,k(tn)), whose extension to a d continuous surjective mapping [0, 1] → C is as detailed above for each neigh- boring cell of C. We present another consequence of the cylindrical decomposition, for which we introduce some notation. Given a list of polynomials (Pi,j), 1 ≤ i ≤ n, r 1 ≤ j ≤ ri, in R[X1,...,Xn], and a family  = (i,j) of signs i,j ∈ {−1, 0, 1} , n we denote by C the subset of R satisfying sign(Pi,j) = i,j for each Pi,j in the family. (Using the notation of Theorem 3.4.7, this is the set Cn .) Now, we (i,j ) define

n C := {x ∈ R | sign(Pi,j) = i,j for i = 1, . . . , n, j = 1, . . . , ri, i,j 6= 0}, which is a semialgebraic subset of Rn. That is, we remove all equalities from the set of sign conditions defining C, relaxing them to sign-independence. A nonempty C is necessarily a subset of C, since C is defined with only a subset of the conditions defining C.

Theorem 3.4.11. Let S ⊂ Rn and U ⊂ S semialgebraic and open in S. Then the set U can be expressed as a union of finitely many sets of the form

{x ∈ S | P1(x) > 0 and ... and Pr(x)}, where P1,...,Pr ∈ R[X1,...,Xn]. Proof. We apply Proposition 3.4.8 to the collection of polynomials used to define the sets S and U, yielding a family of polynomials (Pi,j) in R[X1,...,Xn] which satisfy the conditions of Theorem 3.4.7. Then, both S and U are a finite union of cells C (written as Cn in Theorem 3.4.7). The set C is the union of C  (i,j )   with all other sets of the form

n C0 := {x ∈ R | sign(Pi,j) = i,j for i = 1, . . . , n, j = 1, . . . , ri, i,j 6= 0, 0 and sign(Pi,j) = i,j for i,j = 0},

0 where i,j ∈ {−1, 0, 1}. The set S ∩ C is clearly an open subset of S, and is only defined using strict inequalities. We aim to prove that [  C ∩ S = U,

C⊂U thereby proving that U can be written as a finite union of sets of the form required. For each cell C ⊂ U, we know that C ⊂ (C ∩ S). Therefore [ [ U = C ⊂ (C ∩ S),

C⊂U C⊂U proving the forward inclusion. Now, for each cell C ⊂ U, we consider the sets 0 0 C0 . For each i,j = 0, i,j ∈ {−1, 0, 1}, and so i,j can either be {−1, 0}, {0, 1},

100 0 or 0. That is, i,j ∈ i,j, and therefore C ⊂ C0 = C0 (noting that C0 = C0 by Theorem 3.4.7). This guarantees that C0 has a nonempty intersection with U. Furthermore, since U is assumed to be open in S, (and U is a union of C), C0 must also intersect U nonemptily. (Indeed, we observe that, if for some nonempty C ⊂ U there exists a C0 distinct from C, at least one of the i,j must be equal to 0, meaning that if C0 ∩U = ∅ then we also require C0 ∩S = ∅. Hence, by the contrapositive, if C0 ⊂ S then C0 ∩ U 6= ∅.) As U is a union of cells (and the C0 are also cells, by definition), the set C0 must be contained in U. That is, for each C ⊂ U, the set C is comprised of C0 , where each C0 ⊂ U. Therefore [ [ (C0 ) = (C ∩ S) ⊂ U,

C⊂U C⊂U proving the reverse inclusion, thus concluding the proof.

3.5 Dimension of semialgebraic sets In this subsection we conclude our study of cylindrical algebraic decomposition with the following results on the dimension of semialgebraic sets. We define the semialgebraic dimension of a semialgebraic set via c.a.d., and the local dimension of a semialgebraic set at a point. This allows us to deduce some straightforward consequences regarding the dimension of a semialgebraic set and its closure, as well as the dimension of projections and semialgebraic images of semialgebraic sets. We end with the comparison of the algebraic dimension and semialgebraic dimension of algebraic sets. n p Let S ⊂ R a semialgebraic set, and let S = ∪i=1Ci a union of some cells n Ci of a c.a.d. of R , with each Ci semialgebraically diffeomorphic to an open hypercube (0, 1)di . The dimension of a semialgebraic set S ⊂ Rn is defined as

dim(S) = max di. i=1,...,p

Proposition 3.5.1. The dimension of a semialgebraic set S ⊂ Rn is indepen- dent of the decomposition of Rn of which S is a union of cells. Proof. If we have two such decompositions of a semialgebraic set, say S = p q ∪i=1Ci and S = ∪j=1Dj, we can apply Theorem 3.4.9, treating all of the Ci and Dj as semialgebraic subset of S. That is, we can refine either decomposition to also be adapted to the other. For instance, we can refine the decomposition p S = ∪i=1Ci to be adapted to the collection of semialgebraic subsets D1,...,Dq l to yield a decomposition, say S = ∪k=1Sk, such that each of the Ci and Dj are a disjoint union of cells Sk. We consider one of the Ci as a union of cells Sk, say Ci = ∪k∈αSk for some subset α ⊂ {1, . . . , l}. Clearly the dimension of each Sk, k ∈ α, is at most the dimension of Ci. Conversely, treating Ci as a semialgebraic subset n of R , the dimension of Ci is, by definition, the maximum dimension of the Sk, k ∈ α. Consider a cell Sk of maximal dimension in Ci. By Theorem

101 3.4.9, Ci\Sk contains Ci\Sk and cells of strictly smaller dimension than Ci\Sk, which are therefore of strictly smaller dimension than Sk itself. This means that Ci\Sk ∩ Sk = ∅, Therefore Sk is open in Ci, and hence dim(Sk) = dim(Ci). p Thus, assuming Ci is of maximal dimension in the decomposition S = ∪i=1Ci, the Sk of maximal dimension in Ci must also be of maximal dimension in S. Applying the same argument to the cells Dj, we conclude that the semialgebraic dimension of S is the same according to both decompositions. We observe some behaviors of the dimension of semialgebraic sets under closures and mappings.

Proposition 3.5.2. For a semialgebraic set S ⊂ Rn, dim(S) = dim(S) and dim(S\S) < dim(S).

p Proof. By Theorem 3.4.9, we can write S as a union of cells, S = ∪i=1Ci, such that the closure of some Ci is the union of Ci with some Cj of smaller dimension. The closure of S is the union of all Ci, and therefore dim(S) = dim(S). Furthermore, S\S contains none of the cells of maximal dimension in S, and therefore is a union of cells of strictly smaller dimension. That is, dim(S\S) < dim(S). Proposition 3.5.3. The dimension of a semialgebraic set is invariant under semialgebraic homeomorphism.

n p Proof. If semialgebraic set S ⊂ R has the decomposition S = ∪i=1Ci as in Theorem 3.4.9, each stratum Ci is semialgebraically homeomorphic to an open hypercube (0, 1)di via some g. Then, for h : Rn → Rn a semialgebraic home- di  omorphism, the set h(S) is a union of h(Ci) = h ◦ g (0, 1) , where h ◦ g is a semialgebraic homeomorphism, (as a composition of continuous, bijective, semialgebraic functions, c.f. Proposition 2.4.4).

Lemma 3.5.4. Let S be a semialgebraic subset of Rn+m, and take π : Rn+m → Rn the projection onto the first n coordinates. Then dim(π(S)) ≤ dim(S), and dim(π(S)) = dim(S) if π|S is injective. Proof. First consider the case when m = 1. The set S is a union of cells of a c.a.d. of Rn+1. Particularly, S is a union of graphs and bands defined by the continuous semialgebraic function ξC,i : C → R over cells C ∈ Cn of n the c.a.d. partitioning R . If S consists of any bands BC,i over a cell C of maximal dimension in Cn, then dim(S) = dim(BC,i) = dim(C) + 1 and hence dim(π(S)) = dim(C) = dim(S)−1. If S does not consist of any such bands, but instead only consists of graphs AC,i over the cells C of maximal dimension in Cn, then dim(S) = dim(AC,i) = dim(C) = dim(π(S)). Indeed, if π|S is injective, then S is necessarily a union of graphs, and so dim(π(S)) = dim(S). We now proceed by induction on m, assuming that the statement holds for m − 1, where m > 1. That is, denoting π0 : Rn+m−1 → Rn the projection onto the first n coordinates, we assume that dim(π0(S)) ≤ dim(S), and that 0 0 n+m n dim(π (S)) = dim(S) if π |S is injective. If we then denote π : R → R ,

102 n+m n+m−1 (the projection onto the first n coordinates), and πn+m : R → R , (the projection onto the first n + m − 1 coordinates), we can write π as the 0 composition π ◦ πn+m. Since n is arbitrary, we can treat πn+m in the same way as we treated π in the m = 1 case, replacing n with n + m. The statement of the lemma follows immediately.

Theorem 3.5.5. Let S be a semialgebraic subset of Rn, and let f : S → Rm be a semialgebraic mapping. Then dim(f(S)) ≤ dim(S), and dim(f(S)) = dim(S) if f is injective.

Proof. Let Γ ⊂ Rn+m be the graph of f, which is a semialgebraic set. Denote by n+m n m n+m πn : R → R the projection onto the first n coordinates, and π : R → Rm the projection onto the last m coordinates. Then we can express S and its image via f as

S = πn(Γ), f(S) = πm(Γ).

Since Γ is a graph, πn must be injective on Γ. Then, by Lemma 3.5.4, dim(S) = m dim(πn(Γ)) = dim(Γ), and dim(f(S)) = dim(π (Γ)) ≤ dim(Γ). That is, dim(f(S)) ≤ dim(S). Furthermore, if f is injective, then πm must be injective, meaning that dim(πm(Γ)) = dim(Γ) = dim(S). Therefore, dim(f(S)) = dim(S) if f is injective. We now introduce the notion of local dimension of a semialgebraic set. For a semialgebraic set S ⊂ Rn, and a point x ∈ S, the dimension of S at x, denoted dimx(S), is defined as the non-negative integer d such that there exists a neighborhood V of x, and for any semialgebraic neighborhood U ⊂ V of x in Rn, the dimension of U ∩ S as a semialgebraic set is d. In the following proposition we show that such a d exists.

Proposition 3.5.6. For a semialgebraic set S ⊂ Rn, and a point x ∈ S, there exists a non-negative integer d such that dimx(S) = d.

p n Proof. We assume S = ∪i=1Ci a union of cells of a c.a.d. of R adapted to S. Take V ⊂ Rn a neighborhood of x small enough that the only cells C with nonempty intersection V ∩ C are those such that x ∈ C. Let {Ci}i∈α denote the collection of cells whose closures contain x, where α is some subset of {1, . . . , p}. Since x is in the closure of all such Ci, any neighborhood U of x must also intersect these Ci nonemptily. The intersection U ∩ Ci is an open semialgebraic subset of Ci, and therefore dim(U ∩ Ci) = dim(Ci). Particularly, if dim(Ci) = d := maxi∈α dim(Ci) then by definition we have dim(U ∩ Ci) = dimx(S) = d.

Proposition 3.5.7. For a semialgebraic set S ⊂ Rn, and a point x ∈ S, maxx∈S(dimx(S)) = dim(S), and {x ∈ S | dimx(S) = dim(S)} is a closed semialgebraic subset of S.

103 p n Proof. Let S = ∪i=1Ci a union of cells of a c.a.d. of R . Clearly dimx(S) ≤ dim(S) for any x ∈ S, since dimx(S) is bounded by the dimension of the cells Ci for which x ∈ Ci. Take a cell C of maximal dimension dmax among the Ci contained in S, and take a point x ∈ C. Since C is of maximal dimension, it must be open in S (by similar arguments to those in the proof of Proposition 3.5.1). Then we find an open neighborhood V ⊂ Rn of x so that V ∩ C 6= ∅ and V does not intersect any other cell in S. Therefore, any open neighborhood U ⊂ V of x, U ∩ S is open in S, and so dim(U ∩ C) = dim(C) = dmax. That is, dimx(S) = dmax, showing that there always exists a point x ∈ S at which dimx(S) = dim(S), and hence that dim(S) = maxx∈S dimx(S). Now, the set {x ∈ S | dimx(S) = dim(S)} is a union of cells Ci of maximal dimension, dmax, and their closures. Indeed, if dimx(S) = dmax = dim(S), then x must be contained in the closure of a cell Ci of maximal dimension, (otherwise one can take an open neighborhood V of x small enough that V ∩ Ci = ∅), and dimx(S) = dim(Ci) for every x ∈ Ci. Therefore {x ∈ S | dimx(S) = dim(S)} is a union of finitely closed semialgebraic sets, (as the closure of a semialgebraic set is semialgebraic), and is therefore itself a closed semialgebraic subset of S. We introduce several definitions and notations relating to algebraic sets in or- der to give the final result on dimension via cylindrical algebraic decomposition, as we compare semialgebraic dimension to algebraic dimension. The definitions and the basic results that follow can be found in [7] or [8]. The outline for the proof we give of Theorem 3.5.15 is found in [1], page 58. Let F be a family of polynomials in R[X1, . . . , xn]. Then we denote by Z(F ) the common zeroset of all polynomials P in F . That is

n Z(F ) := {x ∈ R | P (x) = 0 for all P ∈ F }. This is the intersection of sets

\ n Z(F ) = {x ∈ R | P (x) = 0}, P ∈F and therefore the operation of taking common zerosets is inclusion-reversing. That is, if F1 ⊂ F2 then Z(F2) ⊂ Z(F1). An algebraic set, V ⊂ Rn is defined as a subset of Rn that is the common zeroset of some collection of polynomials in R[X1,...,Xn] - a set of the form Z(F ). We also define an ideal I of R[X1,...,Xn] as a family of polynomials in R[X1,...,Xn] such that 0 ∈ I, f + g ∈ I for all f, g ∈ I, and fh ∈ I for all f ∈ I, h ∈ R[X1,...,Xn] arbitrary. It is known (the Hilbert Basis Theorem, [7]) that every ideal I of R[X1,...,Xn] is finitely generated. That is, I is the ideal generated by some finite family (g1, . . . , gs) ⊂ R[X1,...,Xn], which is expressed as s X I = hg1, . . . , gsi := { higi | hi ∈ R[X1,...,Xn]}. i=1

Claim 3.5.8. If I is the ideal generated by F ⊂ R[X1,...,Xn], then Z(I) = Z(F ).

104 Proof. Since F ⊂ I, it is clear that

\ n Z(I) = {x ∈ R | P (x) = 0} P ∈I \ n ⊂ {x ∈ R | P (x) = 0} P ∈F = Z(F ).

Conversely, we know that for any x ∈ Z(F ), P (x) = 0 for all P ∈ F , and so h(x)P (x) = 0 for all P ∈ F , h ∈ R[X1,...,Xn] arbitrary. Therefore X hi(x)Pi(x) = 0 for any hi ∈ R[X1,...,Xn], P ∈F meaning that x ∈ Z(I). Hence Z(F ) ⊂ Z(I).

The algebraic subset defined by an ideal of R[X1,...,Xn] is independent of the base of the ideal. This is proven in the following claim.

Claim 3.5.9. If hf1, . . . , fti = hg1, . . . , gsi, for some (g1, . . . , gs), (f1, . . . , ft) ⊂ R[X1,...,Xn], then Z(F ) = Z(G).

Proof. Let us denote G = (g1, . . . , gs) and F = (f1, . . . , ft). For any f ∈ hF i and any x ∈ Z(F ), f(x) = 0. Similarly, for any g ∈ hGi, g(x) = 0 as well, since g ∈ hGi = hF i. It is then clear that g1(x) = ... = gs(x) = 0 for any x ∈ Z(F ). That is, Z(F ) ⊂ Z(G). The same arguments hold when interchanging F and G, hence the reverse inclusion holds.

Since every ideal in R[X1,...,Xn] is finitely generated, given any finite F ⊂ R[X1,...,Xn], there exists a finite family, say (f1, . . . , ft) ⊂ R[X1,...,Xn], such that hF i = hf1, . . . , fti. In particular, we have Z(F ) = Z(hf1, . . . , fti), by Claim 3.5.9. Furthermore, for a finite family of (real valued) polynomials, 2 2 f1 = ... = ft = 0 if and only if f1 + ... + ft = 0. Therefore we can replace the 2 2 family (f1, . . . , ft) with a single polynomial, f1 +...+ft . This means that every algebraic subset of Rn can be written as the zeroset of a single polynomial. For an arbitrary subset U ⊂ Rn, we define

I(U) := {P ∈ R[X1,...,Xn] | P (x) = 0 for all x ∈ U}.

It is clear that I(U) is an ideal of R[X1,...,Xn]. This operation is also inclusion- reversing. That is, if V1 ⊂ V2, then I(V2) ⊂ I(V1), since all polynomials that are identically zero on V2 are clearly also zero on V1, but not necessarily the other way around.

n  Claim 3.5.10. A set V ⊂ R is algebraic if and only if V = Z I(V ) . Proof. From the definition of I(V ), it is obvious that V ⊂ ZI(V ) for any V ⊂ Rn. If V is algebraic then there exists a finite family of polynomials

105 (g1, . . . , gs) ⊂ R[X1,...,Xn] such that V = Z(g1, . . . , gs). Each of the g1, . . . , gs vanish on V , and are therefore contained in I(V ). Since V is the intersection  of the zerosets of g1, . . . , gs we have Z I(V ) ⊂ V , proving the forward impli- cation. For the reverse implication, we assume that V = ZI(V ). Note that I(V ) is an ideal of R[X1,...,Xn]. That is, V is the zeroset of some ideal I in R[X1,...,Xn]. Since every ideal in R[X1,...,Xn] is finitely generated, we can assume I = hg1, . . . , gsi for some g1, . . . , gs ∈ R[X1,...,Xn]. By Claim 3.5.8, V = Z(I) = Z(g1, . . . , gs), and hence V is algebraic.

Z For V ⊂ Rn, the set V := Z(I(V )) is called the ‘Zariski closure’ of V . Z Indeed, sets of the form V are the closed sets of the ‘Zariski topology’ on Rn. Z Claim 3.5.11. For a subset V ⊂ Rn, the Zariski closure, V , is the smallest algebraic subset of Rn containing V .

Proof. The ideal I(V ) contains all polynomials in R[X1,...,Xn] that are zero on V , and Z(I(V )) is the intersection of the zerosets of all P ∈ I(V ). Since I(V ) is finitely generated, Z(I(V )) is a union of finitely many algebraic sets, and so it is indeed an algebraic subset of Rn containing V . Assume that there is an algebraic set V 0 ⊂ Rn such that V ⊂ V 0 ⊂ Z(I(V )). Then there exists a polynomial 0 Q ∈ R[X1,...,Xn] whose zeroset is V ⊃ V , and therefore Q ∈ I(V ). Since {Q} ⊂ I(V ), and the operation of taking common zerosets is inclusion-reversing, we have

Z V = Z(I(V )) ⊂ Z(Q) = V 0,

Z and hence V 0 = V . Z Z Z Lemma 3.5.12. For subsets A, B ⊂ Rn, A ∪ B = A ∪ B . Proof. The Zariski closure of a set is

Z A = Z(I(A)) \ = Z(P ). P ∈I(A)

Since A ⊂ A ∪ B, we have I(A ∪ B) ⊂ I(A), and hence

Z Z A = Z(I(A)) ⊂ Z(I(A ∪ B)) = A ∪ B .

Z Z Similarly, B ⊂ A ∪ B . Therefore

Z Z A = Z(I(A)) ⊂ Z(I(A ∪ B)) = A ∪ B

Z Z Z Z Conversely, it is clear that A ∪ B ⊂ A ∪ B , since A ⊂ A and B ⊂ B . Z Z Furthermore, A ∪ B is algebraic (as a finite union of algebraic sets) and the

106 Zariski closure of A ∪ B is, by definition, the smallest algebraic set containing A ∪ B. Hence

Z Z Z A ∪ B ⊂ A ∪ B .

n For a subset V ⊂ R , we denote by P(V ) the quotient ring R[X1,...,Xn]/I(V ), which we can write as  P(V ) = [P ] ∈ R[X1,...,Xn] | P = AQ + R,Q ∈ I(V ), A, R ∈ R[X1,...,Xn] , the set of equivalence classes of polynomials modulo the ideal I(V ). So, if R is identically zero on V , we have P = AQ, and hence P ∈ I(V ). An ‘irreducible’ algebraic set V ⊂ Rn is an algebraic set which cannot be written as a union of algebraic sets V = A ∪ B, with A, B ( V . Claim 3.5.13. If V ⊂ Rn is an irreducible algebraic set, then P(V ) is an integral domain.

Proof. Assume that P(V ) is not an integral domain. Then there are P1,P2 ∈ P(V ) such that P1 and P2 are not identically zero on V but the product P1P2 is. Therefore, Z(P1P2) = Z(P1) ∪ Z(P2), where Z(P1), Z(P2) ( V . Hence V is not irreducible. Taking the contrapositive, we obtain the statement of the claim.

An algebraic set V ⊂ Rn has a unique decomposition into finitely many irreducible algebraic subsets, say V = V1 ∪ ... ∪ Vq.

Claim 3.5.14. Every algebraic set V ⊂ Rn has a unique decomposition into finitely many irreducible algebraic subsets.

Proof. If V = U1 ∪...∪Uq and V = V1 ∪...∪Vq are two different decompositions of V into irreducible algebraic subsets, then each intersection Ui ∩ Vj is also an algebraic subset of V . Hence if there exist Ui 6= Vj such that Ui ∩ Vj 6= ∅, at least one of the sets Ui or Vj are not irreducible, which is a contradiction.

If V ⊂ Rn is an irreducible algebraic set, (so that P(V ) is an integral do- main), we denote by K(V ) the field of fractions of P(V ). That is, K(V ) is the field of rational fractions on V . We now aim to define the algebraic dimension of an algebraic subset of Rn. An ideal I of the ring P(V ) is said to be ‘prime’ if, for f, g ∈ R[X1,...,Xn], fg ∈ I =⇒ either f ∈ I or g ∈ I.

An algebraic set V ⊂ Rn is irreducible if and only if I(V ) is a prime ideal of R[X1,...,Xn]. (The proof of this fact can be found in [7].) We define the algebraic dimension of an algebraic set V ⊂ Rn to be the Krull dimension of the ring P(V ). That is, the largest height of a chain of prime ideals I0 ( I1 (

107 ... ( Id ( P(V ) in P(V ). If V = V1 ∪ ... ∪ Vq a union of algebraic sets, then dim(V ) = maxi=1,...,q dim(Vi). (Note that, by the ascending chain condition, the dimension of an algebraic subset of Rn is at most n - we refer to [7] again for this fact.) Example 30. Let V = {(x, y, z) ∈ R3 | x2 + y2 + z2 − 1 = 0}, the unit sphere 3 2 2 2 in R centered at the origin, and denote PV (X,Y,Z) = X + Y + Z − 1. By observation, the dimension of V is 2. We will use this fact to illustrate the above definition of the algebraic dimension of V as the Krull dimension of P(V ). We take the chain of prime ideals I0 ( I1 ( I2 ( I3 in P(V ), where

I0 = h[0]i,

I1 = h[X]i,

I2 = h[X], [Y ]i,

I3 = h[X], [Y ], [Z]i.

The ideal I0 consists of all equivalence classes [P ] of polynomials in P(V ) of the form P = PV Q + (0)R, where Q, R ∈ R[X,Y,Z]. These are all identically zero on V . Similarly, I1 is the set of all equivalence classes of polynomials of the form P = PV Q + (X)R, and I2 is the set of all equivalence classes of polynomials of the form P = PV Q + (X)R and PV Q + (Y )R. Finally, for I3, we can set R1 = X, R2 = Y , R3 = Z, and take

2 P1(X,Y,Z) = PV Q + (X)R1 = PV Q1 + X , 2 P2(X,Y,Z) = PV Q + (Y )R2 = PV Q2 + Y , 2 P3(X,Y,Z) = PV Q + (Z)R3 = PV Q3 + Z , so that

 2 2 2 P(V ) 3 P1 + P2 + P3 = PV (Q1 + Q2 + Q3)X + Y + Z , which is equal to 1 on V . Therefore any polynomial in P(V ) is also in I3. That is, I3 = P(V ), and so we have achieved a prime ideal chain I0 ( I1 ( I2 ( I3 = P(V ). In fact, our choice of X, Y , Z as the generating polynomials ensures that this chain the longest possible. Hence the Krull dimension of P(V ) (and the algebraic dimension of V ) is 2.

Theorem 3.5.15. Let S be a semialgebraic subset of Rn. The dimension of S as a semialgebraic set is equal to the algebraic dimension of the Zariski closure Z S of S.

Proof. We give a sketch of the proof. Assume that S = ∪i=1,...,pCi is a stratifi- cation of S according to Theorem 3.4.9, so that the semialgebraic dimension of S is the dimension of the cell, say Ctop, of maximal dimension in S. By Lemma Z p Z Z 3.5.12, S = ∪i=1Ci , and the algebraic dimension of S is the maximum al- Z gebraic dimension of the Ci . Hence it is enough to prove the statement of the

108 theorem for a single cell of the c.a.d.. This is done by induction on n, treating the cases of Ctop equal to a graph and band separately. For n = 1, a cell Ctop in a nonempty semialgebraic set S ⊂ R is either a point or an open interval. The algebraic dimension of a point is 0, and the Zariski closure of an open interval is R, with algebraic dimension 1, both of which agree with the semialgebraic dimension. Now, let n > 1 and assume the statement of the theorem holds for 0 n − 1. A cell C ∈ Cn is either a graph or a band in the cylinder C × R over 0 some cell C ∈ Cn−1 of the c.a.d.. Hence the projection of C onto the first n − 1 coordinates is π(C) = C0. We assume that C0 is semialgebraically diffeomorphic to the open hypercube (0, 1)d. If C is the graph of a semialgebraic function, ξ : C0 → R, then C is also semialgebraically diffeomorphic to (0, 1)d. There 0 0 0 exists a polynomial, say P ∈ R[X1,...,Xn] such that, for all c ∈ C , P (x ,Xn) 0 0 is not identically zero as a polynomial in Xn, and P (c , ξ(c )) = 0. We take the Z Zariski closure C0 , which is an algebraic subset of Rn−1, and let

0Z C = V1 ∪ ... ∪ Vq be its decomposition into irreducible components. Then, since C ⊂ Z(P ),

Z   C ⊂ Z(P ) ∩ (V1 × R) ∪ ... ∪ (Vq × R) , where each Z(P ) ∩ (Vi × R) is a proper subset of Vi × R. We have the following inequality regarding algebraic dimensions:  dim (Z(P ) ∩ (Vi × R) < dim(Vi × R) = dim(Vi) + 1,

Z 0Z from which we deduce that dim(C ) ≤ dim(C ). Indeed, each Vi × R is irre- ducible, dim(Vi × R) = dim(Vi) + 1, and each Z ∩ (Vi × R) is a proper algebraic Z subset of Vi × R. To obtain the reverse inequality, we consider C and its ir- Z reducible components, say C = W1 ∪ ... ∪ Wp. The projection π induces an injective homomorphism from each P(π(Wi)) to P(Wi), and therefore a field homomorphism from each K(π(Wi)) to K(Wi). Equating the Krull dimension of P(Wi) with the transcendence degree of K(Wi) over R, one obtains the in- equality dim(π(Wi)) ≤ dim(Wi), algebraic dimension, from which we deduce Z Z Z Z that dim(C0 ) ≤ dim(C ). Hence dim(C0 ) = dim(C ) = d. Now, if C is a band in C0 × R, (so that C is semialgebraically diffeomorphic Z Z to (0, 1)d+1), then C ⊃ (C0 × R), and hence the algebraic dimension of C is Z equal to or greater than the algebraic dimension of C0×R. Therefore dim(C ) = Z dim(C0 ) + 1 = d + 1.

109 3.6 Triangulation of semialgebraic sets Previously, we developed the method of cylindrical algebraic decomposition, showing how to decompose Rn into semialgebraic cells, where each cell is semi- algebraically diffeomorphic to an open hypercube. We also saw that this decom- position can be adapted so that any semialgebraic subset S of Rn is a union of these cells, and that the decomposition can be further refined so that any finite collection of semialgebraic subsets of S can also be represented as a union of these disjoint cells. In this subsection we examine the method of triangulating compact semialgebraic sets. The main result is Theorem 3.6.1, stating that, for any semialgebraic subset S ⊂ Rn, we can construct triangulations of Rn such that S is semialgebraically homeomorphic to a union of open simplices of this triangulation, and that S is therefore semialgebraically homeomorphic to a simplicial complex in Rn. We use results on cylindrical decomposition of semi- algebraic sets from Subsection 3.4 to prove Theorem 3.6.1, (the triangulation theorem), which is used in Section 4 to prove Hardt’s triviality of semialgebraic sets, (Theorem 4.1.1). We introduce the notions necessary to state and prove the triangulation theorem. A simplex is the convex hull of a collection of affine-independent n points. More precisely, for points v0, . . . vd ∈ R not contained within the same d−1-dimensional affine subspace, we define the d-simplex with vertices v0, . . . , vd to be the set

d d n X X [v0, . . . , vd] := {x ∈ R | ∃λ0, . . . , λd ∈ [0, 1] with λi = 1 and x = vi}. i=0 i=0

A face of [v0, . . . , vd] is any simplex [u0, . . . , ud0 ] where the set of vertices {u0, . . . , ud0 } is a subset of {v0, . . . , vd}.A proper face of a d-simplex is a face that is a d − 1- simplex in its own right. We define the open simplex with vertices v0, . . . , vd to be

d d n X X (v0, . . . , vd) := {x ∈ R | ∃λ0, . . . , λd ∈ (0, 1] with λi = 1 and x = vi}, i=0 i=0 where that the barycentric parameters, λi, are required to be nonzero in the open simplex.

110 Figure 25: Examples of low-dimensional simplices and open simplices.

Note that in the case of an open d-simplex with d ≥ 1, the λi are also implicitly prohibited from taking the value 1. We define both the 0-simplex [v0] and the open 0-simplex (v0) to be the point {v0}. If σ is the simplex [v0, . . . , vd], let us ◦ n denote by σ the open simplex (v0, . . . , vd). Any simplex or open simplex in R is clearly a semialgebraic subset of Rn. n A finite simplicial complex K in R is a collection of simplices {σ1, . . . , σl}, n σi ⊂ R , such that every face of a simplex σi ∈ K is also in K, and the nonempty intersection σi ∩ σj of any two σi, σj ∈ K is a face of both σi and σj. That is, any point v in the intersection σi ∩σj that is a vertex of σi must also be a vertex of σj.

Figure 26: Example of a simplicial complex (right), and false example of a simplicial complex (left).

111 In the left diagram of Figure 26, the points v3 and v4 are not vertices of the simplex [v0, v1, v2], and therefore the intersection [v0, v1, v2]∩[v3, v4, v5] = [v3, v4] of is not a face of [v0, v1, v2]. However, in the right diagram, [v0, v1, v2] is divided into 3 distinct simplices such that [v3, v4, v5] only intersects [v0, v3, v4], and this intersection, [v3, v4], is indeed a face of both [v0, v3, v4] and [v0, v1, v2]. We give an example of a finite simplicial 3-complex, (a simplicial complex containing at least one 3-simplex, and not containing any d-simplices for d > 3).

Figure 27: A finite simplicial 3-complex.

n Sl For a simplicial complex, K = {σ1, . . . , σl} in R , we define |K| := i=1 σi. Then |K| is a semialgebraic subset of Rn, as it is a finite union of simplices, each of which are semialgebraic. If, for a set S ⊂ Rn there exists a simplicial complex K such that S = |K|, the set S is called a polyhedron, and K is a simplicial decomposition of S. Pd 1 For a simplex σ = [v0, . . . , vd], we denote by σb := i=0 d+1 vi the barycenter of σ. That is, the arithmetic mean of the position of all vertices of σ. If K is a finite simplicial complex, we denote by K0 the ‘barycentric subdivision’ of K, which we define to be the (finite) simplicial complex comprised of all possible simplices [σ ,..., σ ] such that the σ , . . . , σ are simplices in K, and each bk0 bkd k0 kd σkj is a proper face of σkj +1.

112 Figure 28: Barycentric subdivisions of a 0-simplex, 1-simplex, and 2-simplex.

Theorem 3.6.1. Let S ⊂ Rn be a compact semialgebraic set with semialgebraic n subsets S1,...,Sq ⊂ S. There exists a finite simplicial complex K in R and a semialgebraic homeomorphism h : |K| → S such that each Si is the image of a union of open simplices in K by h. Proof. We prove this by induction on n and is constructive in nature. The essence of the induction step is in using cylindrical algebraic decomposition to obtain the cells of Cn−1 and Cn, which possess desirable properties in relation to the sets S, S1,...,Sq, then breaking down the cells within each cylinder C × R, C ∈ Cn−1, using projections and barycentric subdivisions. This enables us to construct semialgebraic homeomorphisms between new, higher-order simplices, and the (further decomposed) cells within C ×R. We then check that the newly constructed simplices and homeomorphisms can be glued together to satisfy the statement of the theorem. Let n = 1. The semialgebraic subset S ⊂ R, and the S1,...,Sq are unions of points and open intervals, all of which are open simplices. Hence we can take |K| = S, and set h : K → S to be the identity map. Now let n > 1, and assume that the statement of the theorem holds for n − 1. By Proposition 3.4.8 the sets S and S1,...,Sq are, up to a linear automorphism, unions of n cells of a c.a.d. of R adapted to a family of polynomials (Pi,j) satisfying the conditions of Theorem 3.4.7 and Theorem 3.4.9. Therefore the S, S1,...,Sq are, up to a linear automorphism, unions of semialgebraic subsets Cn of n, (i,j ) R each defined by a boolean combination of sign conditions on the polynomials n n n in (Pi,j), such that C = C is the union of C with cells of strictly (i,j ) (i,j ) (i,j ) smaller dimension. The c.a.d. also produces a finite partition of Rn−1 into  n−1 PROJ (Pi,j) -invariant cells, C , with the same closure properties, and over (i,j ) each of these cells C there are finitely many continuous semialgebraic functions

ξC,1 < . . . < ξC,lC : C → R describing the roots of the Pn,1,...,Pn,rn (as polynomials in Xn). Explicitly, the sets S, S1,...,Sq are all unions of graphs AC,k and bands BC,k defined by these semialgebraic functions. We denote by πn the projection

n n−1 πn : R −→ R (x1, . . . , xn−1, xn) 7−→ (x1, . . . , xn−1)

113 onto the first n − 1 coordinates. We know that the set πn(S) is semialgebraic (Tarski-Seidenberg 2nd form, 2.2.1), and compact (as the continuous image of a compact set). By the cylindrical arrangement of cells of the c.a.d., the sets πn(S), and the πn(S1), . . . , πn(Sq) ⊂ πn(S) are unions of cells in Cn−1. By the inductive assumption, there exists a finite simplicial complex L in Rn−1 and a semialgebraic homeomorphism g : |L| → πn(S) such that each cell C ⊂ πn(S), C ∈ Cn−1, is a union of images of open simplices in L by g.

Figure 29: A cell C ∈ Cn−1 as a union of images of open simplices of L by the    homeomorphism g. Particularly, C = g (v0, v1, v2) ∪g (v0, v2, v3) ∪g (v0, v2) .

For simplicity, we can further subdivide the cells C ∈ Cn−1 so that they are each the image of a single open simplex of L.

Figure 30: Cells C1,C2,C3 ∈ Cn−1 as the images of open simplices of L by the homeomorphism g.

n We can now construct a finite simplicial complex Lτ in τ × R ⊂ R over each simplex τ ∈ L, along with a semialgebraic homeomorphism

◦  hτ : |Lτ | −→ S ∩ g(τ) × R .

◦ Take a simplex, say τ := [u0, . . . , ud] in L, and let g(τ) = C is a cell in Cn−1. (We maintain the assumption that each cell in Cn−1 is the image of an open

114 simplex in L by g.) Since S is assumed to be compact, each nonempty cylinder C × R must contain at least one of the graphs of the functions ξ :→ R of the c.a.d.. Take one of these functions, say ξC,k : C → R, whose graph AC,k is contained in S. We first treat the case of one of these graphs contained in S. By Lemma 3.4.5, ξC,k extends continuously to all cells D ⊂ C. Denote this extension of ξC,k by ξC,k. We can write ξC,k is the continuous extension of ξC,k ◦ on C = g(τ) = g(τ). Then, for each vertex ui of τ, we define  vi := ui, ξ(g(ui)) ∈ τ × R,

n and denote σξC,k := [v0, . . . , vd], which is a simplex in R .

n−1 Figure 31: A simplex τ and its image C = g(τ) in R . The value of ξC,k at each vertex of τ defines the n-th coordinate value of the vertices in σ.

We define the homeomorphism hτ : σξC,k → AC,k by  hτ (v) = g(u), ξC,k(g(u)) , where u = λ0u0 + ... + λdud is a point in τ, and

AC,k 3 v =λ0v0 + ... + λdvd   =λ0 g(u0), ξC,k(g(u0)) + ... + λd g(ud), ξC,k(g(ud)) .

115 Figure 32: The simplex σξC,k over τ, and the image of a point v ∈ σξC,k by hτ ◦ in the closure of the cylinder g(τ) × R.

We now consider the case when there is more than 1 function of the c.a.d. over the cell C. If there are 2 consecutive functions ξC,k and ξC,k0 : C → R, we 0 0 define the simplices σξC,k = [v0, . . . , vd] and σξC,k0 [v0, . . . , vd] as above. Notice that it is possible for σξC,k and σξC,k0 to have the same vertices, making them identical. (For instance, the projection of the unit circle about the origin onto the first coordinate is the interval [−1, 1]. If we set τ = [−1, 1], and let ξ1 and ξ2 be the functions describing the bottom and top halves, respectively, then the 2 simplices σξ1 and σξ2 are both the line segment [−1, 1]×{0}, a 1-simplex in R .) To avoid this possibility, we take the barycentric subdivision L0 of L. Indeed, if 0 τi ⊂ τ is a simplex of L , (and g(τi) ⊂ g(τ) = C), then τb is a vertex of τi, and since the functions ξC,k and ξC,k0 are strictly ordered over each cell C ∈ Cn−1, they must take different values at τb. Therefore, replacing L with its barycentric 0 subdivision L , we can assume that the simplices σξC,k and σξC,k0 have at least one different vertex from one another, making them distinct simplices. Note that this also prevents the formation of identical simplices for distinct functions ◦ ξD,k and ξD,k0 over cells D ⊂ C. This is due to the fact that D = g(φ) where φ is some face of τ, and therefore the barycenter b(φ) is a vertex of the simplices of the barycentric subdivision of φ.

116 3 Figure 33: Cells of a c.a.d. adapted to a sphere in R . Although ξD,1 and ξD,2 coincide at g(u0) and g(u1), they take distinct values at g(τb).

We now treat the case of a band BC,k contained in S. We maintain the ◦ notation of C = g(τ) and C = g(τ) for some cell C ∈ Cn−1 of the c.a.d.. Since S is assumed to be compact, a band BC,k ⊂ S is bounded from above and below by some AC,k and AC,k+1 in C ×R. We construct the simplices σξC,k and

σξC,k+1 over τ, as above, and consider the region between them. Denote by B this polyhedron - the subset of τ × R bounded from above and below by σξC,k and σξC,k+1 , respectively.

Figure 34: The polygon, B, and its image hτ (B) as the closure of a band BC,k in C × R.

We show that B is indeed a polyhedron by giving a simplicial decomposition. If

117 0 0 B is bounded between the simplices [v0, . . . , vd] and [v0, . . . , vd], then we have

[ 0 0 B = [v0, . . . , vi, vi, . . . , vd]. 0 0≤i≤d, vi6=vi

0 Each of these simplices are distinct since vertices vi 6= vi differ only in their last 0 coordinate, and the last coordinate of each vi is greater than or equal to the last coordinate of each corresponding vi. For successive, distinct simplices in the above union, their intersection is clearly a face of both simplices. Each of 0 0 the [v0, . . . , vi, vi, . . . , vd] (and all faces) belong to Lτ .

Figure 35: Simplices of the simplicial decomposition of the polyhedron B.

0 0 0 For points v = [v0, . . . , vd] ∈ σξC,k and v = [v0, . . . , vd] ∈ σξC,k+1 , we define 0 hτ to be the function sending the vertical segment [v, v ] ⊂ B linearly onto the   segment [ g(u), ξC,k(g(u)) , g(u), ξC,k+1(g(u)) ], where u = λ0v0 + . . . λdud is a point in τ, and   v = λ0 g(u0), ξC,k(g(u0)) + ... + λd g(ud), ξC,k(g(ud)) 0   v = λ0 g(u0), ξC,k+1(g(u0)) + ... + λd g(ud), ξC,k+1(g(ud)) .

0 We can ensure that the vertices vi and vi are distinct for at least one of the pairs of corresponding vertices since we can replace L with L0. Therefore, we 0 0 know that v and v are distinct if and only if hτ (v) and hτ (v ) are distinct. The construction of Lτ is now complete. It remains to show that the Lτ and hτ can be glued together to form h and K as required. If σ is a simplex of a simplicial complex, K, then the faces of σ are also assumed to be in K. Therefore, it is enough to check that the hτ and Lτ glue together for simplex and its faces. Let σξD,k0 ∈ Lφ such that σξD,k0 intersects |Lτ | nonemptily, and hφ(σξD,k0 ) is the closure of the graphs of one of ◦ the ξD,k0 : g(φ) → R. By Lemma 3.4.5, ξD,k0 is the extension by continuity of some function ξC,k : C → R of the c.a.d. to D. That is, ξD,k0 coincides with

118 ◦

ξC,k on g(φ), σξC,k is a simplex of Lτ , and therefore σξD,k0 is a simplex of Lτ , and in fact, σξC0,k0 is a face of σξC,k . It is then clear that hτ and hφ agree on |Lτ | ∩ |Lφ|. Finally, we check that the simplicial decomposition of B induces the same decomposition of B ∩ (φ × R). If we take an ordering of the entire set of vertices u0, . . . , ud in L, say u0 < . . . < ud, then the decomposition

[ 0 0 B = [v0, . . . , vi, vi, . . . , vd], i=1,...,d

0 0 where each vi and vi are in the cylinder {ui}×R and vi < vi, is as required. We give a simple example of triangulation of a semialgebraic set as in the above theorem. Example 31. We take S = {(x, y, z) ∈ 3 | 1 − x2 − y2 − z2 ≥ 0 and 0 ≤ √ R z ≤ 3 }, and set P (X,Y,Z) = 1 − X2 − Y 2 − Z2, P (X,Y,Z) = Z, and 2 √3,1 3,2 3 P3,3(X,Y,Z) = Z − 3/2. We produce a c.a.d. of R adapted to P3,1,P3,2,P3,3. Completing the family according to the conditions of Theorem 3.4.7, we set P3,4(X,Y,Z) = −2Z, and P3,5(X,Y,Z) = −2, P3,6(X,Y,Z) − 1 constants. We 2 2 2 2 2 compute PROJ(P3,1,P3,2,P3,3,P3,4) = (X + Y − 1,X + Y − 1/4, 4(X + Y 2 −1)). Completing this family to include all nonzero derivatives, we then have 2 2 2 2 2 2 P2,1 = 4(X + Y − 1), P2,2 = X + Y − 1, P2,3 = X + Y − 1/4, P2,4 = 2Y , P2,5 = 8Y , and some constants. The real roots of family PROJ(P2,1,...,P2,5) are −1, −1/2, 0, 1/2, 1.

C1 = {C1,...,C11} 1 1 1 = (−∞, −1), {−1}, (−1, − ), {− }, (− , 0){0}, 2 2 2 1 1 1 (0, ), { }, ( , 1), {1}, (1, ∞) . 2 2 2

The projection of S onto the first coordinate as a union of cells of C1 is

π1(S) =[−1, 1]

=C2 ∪ ... ∪ C10 1 1 1 1 1 1 ={−1} ∪ (−1, − ) ∪ {− } ∪ (− , 0) ∪ {0} ∪ (0, ) ∪ { } ∪ ( , 1) ∪ {1}, 2 2 2 2 2 2 noting that each of these cells√ is also an open simplex in R. The polynomials 2 P2,1 and P2,2 have roots at ± 1 − x in the cylinder over C3,...,C9, and a p 2 single root at 0 over C2 and C10. The polynomial P2,3 has roots ± 1/4 − x in the cylinders over C5, C6, and C7, and a single root at 0 over C4 and C8. The polynomials P2,4 and P2,5 both have a single root at 0 over every cell. We obtain the following decomposition of π(S) as a union of cells in C2, accompanied by a triangulation g : |L| → π(S).

119 Figure 36: π(S) is a union of cells in C2 of the c.a.d., and is semialgebraically homeomorphic to |L| via g. Each cell in π(S) is semialgebraically homeomorphic ◦ to an open simplex τ via gτ , where τ ∈ L. The (dashed) diagonal line segments in π(S) are due to the simplicial decomposition of the polyhedra in L.

We construct Lτ in τ × R for some simplex τ of maximal dimension in L. Let τ = [u2, u6, u7], so that g(τ) = C5,7, the closure of the inner disk in the 2nd  quadrant of the plane. The set C5,7 × R ∩ S is shown below, along with the corresponding polyhedron B in τ × R. The total ordering u0 < . . . < u12 of the vertices of L induces the ordering u2 < u6 < u7 of the vertices of τ. This ordering induces the decomposition of the polyhedron B outlined in the proof of Theorem 3.6.1,

B = σ1 ∪ σ2 ∪ σ3 0 0 0 0 0 0 = [v2, v2, v6, v7] ∪ [v2, v6, v6, v7] ∪ [v2, v6, v7, v7].

Figure 37: The image hτ (B) = (C5,7 × R) ∩ S, and the simplicial decomposition of the polyhedron B.

120 Consider a face, say [u2, u6] of τ. The polyhedron in [u2, u6] × R has the decom- position

0 0 0 [v2, v2, v6] ∪ [v2, v6, v6].

0 Now consider the face [v2, v2, v6] of σ1, which is a simplex in the cylinder over [u2, u6]. Indeed, we see that the ordering of the vertices of L induces the same simplicial decomposition of the faces of B as it does the polygons in the cylinders over the faces of τ. With the Triangulation Theorem, we are now able to state and prove the Curve Selection Lemma.

Lemma 3.6.2. Let S be a semialgebraic subset of Rn, and x ∈ S\S. Then there exists a continuous semialgebraic mapping γ : [0, 1] → Rn such that γ(0) = x and γ((0, 1]) ⊂ S.

Proof. For the purpose of this proof, we can assume S compact, since we can replace S with B(x, 1) ∩ S, which is bounded and semialgebraic. By Theorem 3.6.1, there exists a finite simplicial complex K in Rn, and a semialgebraic homeomorphism h : |K| → S such that S is a union of the images of finitely many open simplices of K via h. In fact, treating the point {x} as a semialgebraic subset of S, we can assume that {x} is also the image of an open simplex of K via h. Since h is a homeomorphism, {x} must be the image of point, say {v} ⊂ |K|, and since {v} is an open simplex of K, it must be a vertex of K. Furthermore, since h(v) ∈/ S but h(v) ∈ S, v must be a vertex of some simplex σ ∈ K such ◦ that h(σ) ⊂ S. Therefore we can take a function, say α : [0, 1] → |K|, which maps the interval [0, 1] affinely onto the segment [v, σ] such that α(0) = v and ◦ b α((0, 1]) = (v, σb] ⊂ σ. Then, setting γ := h ◦ α : [0, 1] → S, we have γ(0) = h(v) and γ((0, 1]) = h((v, σb]) ⊂ S, satisfying the statement of the lemma. We call a semialgebraic set S semialgebraically arcwise connected if for any 2 points x, y ∈ S there exists a continuous semialgebraic mapping γ : [0, 1] → S such that γ(0) = x and γ(1) = y. Corollary 3.6.3. Every connected semialgebraic set is semialgebraically arcwise connected.

Proof. Let S ⊂ Rn be a connected semialgebraic set. We can assume that the closure ofS is compact by taking the semialgebraic homeomorphic transforma- tion x 7→ x/(||x|| + 1). Then, there exists a simplicial complex K in Rn and a semialgebraic homeomorphism h : |K| → S such that every semialgebraic subset of S is the image by h of a union of open simplices σ ∈ K. Now, since S is con- n nected, for any two cells Ci,Ci0 ⊂ S (as in the c.a.d. of R adapted to S) there 0 must exist a finite sequence Ck1 ,...,Ckl such that Ci = Ck1 , Ci = Ckl , and for any j = 1, . . . , l − 1, either Ckj ⊂ Ckj+1 or Ckj+1 ⊂ Ckj . Since these cells are semialgebraic subsets of S, we can ask that the triangulation of S be such that S and the cells of the c.a.d. comprising S are the unions of images by h of open

121 simplices of K. Furthermore, for any two points x, y ∈ S, we can ask that the c.a.d. is such that the singletons {x} and {y}, (as semialgebraic subsets of S), are both a union of cells. Then, for the sequence {x} = Ck1 ,...,Ckl = {y} of cells in S, we consider the collection of open simplices which map onto each of these cells via h. In fact, we can ask that the c.a.d. has been refined so that each cell is the image by h of a single open simplex in K. Hence we obtain a sequence of open simplices, σk1 , . . . , σkl , such that h(σk1 ) = {x} and h(σkl ) = {y}, and for each j = 1, . . . , l − 1, either σkj is a face of σkj+1 , or σkj+1 is a face of σkj . We construct a path in |K| by taking the linear segments

L := [σ , σ ] j bkj bkj+1 joining the barycenters of successive open simplices in the sequence, and we j j+1 define the functions δj mapping the segment [ l , l ] linearly onto Lj such that δ j  = σ and δ j+1  = σ for each j = 1, . . . , l − 1. Then, denoting j l bkj j l bkj+1 l−1 L := ∪j=1Lj, and with the (continuous semialgebraic) mapping L → S

n j j+1 δ(t) := δj(t) for t ∈ [ l , l ] , we set γ := h ◦ δ : [0, 1] → S, a continuous semialgebraic mapping such that g(0) = x and g(1) = y, as required.

122 4 Hardt’s trivialization and consequences

In this section we present Hardt’s theorem on the semialgebraic triviality of continuous semialgebraic mappings, giving an illustrative proof and exploring several consequences such as the local conic structure of semialgebraic sets, and bounds on the number of connected components and the number of topological types of semialgebraic sets. Hardt’s theorem states that the target space of any continuous semialgebraic mapping f of any semialgebraic set S can be parti- tioned into finitely many components C1,...,Cp such that f is semialgebraically trivial over each Ci. That is, for points c1, c2 ∈ Ci, the fibers over c1 and c2 are semialgebraically homeomorphic to one another. Moreover, much of the power of Hardt’s triviality lies in the ability to refine the components Ci so that the trivializations are compatible with any finite collection of semialgebraic subset of S. We also give a proof of the semialgebraic version of Sard’s theorem on the dimension of sets of critical values of semialgebraic C∞ mappings between semialgebraic smooth manifolds. In [1], both Hardt’s theorem and Sard’s theo- rem are given without proof. Here, we aim to give a functioning understanding of why they are true, using results primarily from [9].

4.1 Semialgebraic triviality of semialgebraic sets We begin our introduction of semialgebraic trivialization with the familiar set- ting of a cylindrical algebraic decomposition of Rn adapted to some semialge- braic subset. We know that a semialgebraic subset S ⊂ Rn defined by polyno- mials P1,...,Pr can be represented as a disjoint union of strata of a c.a.d. of n R adapted to P1,...,Pr. Particularly, the cells comprising S are graphs and bands in the cylinders Ci × R, where Ci are cells in Cn−1 of the adapted c.a.d.. n−1 Then, letting πn : S → R denote the projection onto the first n − 1 coordi- nates, the fibers of πn restricted to S essentially look the same for each point n−1 ci ∈ Ci. To make this more precise, we call R the target space of πn, and for −1 each ci ∈ Ci, we call πn (ci) the fiber of πn over ci. Now, since S is a union of graphs and bands over each Ci, the intersection S ∩(Ci ×R) is semialgebraically homeomorphic to a product Ci × Fi, where Fi is a semialgebraic subset of R. Indeed, we can choose Fi to be the fiber of πn over some ci ∈ Ci, and it is clear −1 that S ∩ (Ci × R) is semialgebraically homeomorphic to Ci × πn (ci). Hence, πn is said to be semialgebraically trivial over each Ci. We extend the notion of semialgebraic triviality to all continuous semialgebraic mappings. Let S ⊂ Rn be a semialgebraic set, and f : S → Rk a continuous semialge- braic mapping. Then f is said to be semialgebraically trivial over a semialge- braic subset C ⊂ Rk is there exists a set F and a semialgebraic homeomorphism −1 h : f (C) → C × F such that π ◦ h = f|f −1(C), where π denotes the projection C × F → C. Such an h is called a semialgebraic trivialization of f over C. The following diagram is a common accompaniment to this definition.

123 Figure 38: The function h is a semialgebraic trivialization of f over C.

For a simple working example, consider S = {(x, y) ∈ R2 | x2 + y2 − 1 = 0} and let f : S → R be the projection onto the first coordinate. The family C1 of the c.a.d. of Rn adapted to x2 + y2 − 1 contains the sets {−1},(−1, 1), and {1}. Collectively, these intervals make up the closed interval [−1, 1], which is the image of S by f. The cylinders {−1} × R and {1} × R intersect S in only a single point. However, the intersection of the cylinder (−1, 1) × R with S is the disjoint union of the top half with the bottom half of the circle over the open interval (−1, 1). Taking the point c = 0 in C = (−1, 1), the fiber of f over c is f −1(0) = {(0, −1), (0, 1)}. Then, S ∩ (C × R) = f −1(C) is semialgebraically homeomorphic to C × F = (−1, 1) × ({−1} ∪ {1}).

Figure 39: The subset of S that maps onto the open interval (−1, 1) via f is the two disjoint halves (top and bottom) of the circle. We take F := f −1(0) = {−1} ∪ {1}. It is clear that f|f −1(0) = π ◦ h.

Extending our definition, we consider a semialgebraic subset S0 ⊂ S. Then h is said to be compatible with S0 if there exists a semialgebraic subset F 0 ⊂ F such that hS0 ∩f −1(C) = C ×F 0. We are now able to state Hardt’s Theorem.

Theorem 4.1.1 (Hardt’s semialgebraic triviality). Let S ⊂ Rn be a semialge- k braic set with finitely many semialgebraic subsets S1,...,Sq, and f : S → R

124 a continuous semialgebraic mapping. There is a finite semialgebraic partition k C1,...,Cm of R such that f is semialgebraically trivial over each of the Ci, −1 and each trivialization hi : f (Ci) → Ci × Fi is compatible with the subsets Sj. Proof. We set up the proof for an induction on the lexicographically ordered pairs (m, n). First, we replace the set S with Γf , the graph of f over S. m+n m Then the function f :Γf → T is equivalent to the projection R → R of Γf onto the first m coordinates. We assume the theorem to be true for all pairs (m0, n0) < (m, n) using the lexicographical ordering. According to The- orem 3.4.9, both S and T partitioned by finitely many semialgebraic subsets S1,...,Sq and T1,...,Tr respectively, which are defined by equations and in- equalities of finitely many polynomials f1(X,Y ), . . . , fs(X,Y ) ∈ R[X,Y ], where X := (X1,...,Xm) and Y := (Y1,...,Yn). Using the constructions in Proposi- tion 3.4.8 we can assume that the fi(X,Y ) are of the form

di di−1 0 0 fi(X,Y ) = Yn g0,i(X) + Yn g1,i(X,Y ) + ... + gdi,i(X,Y ),

0 0 where we denote Y = (Y1,...,Yn−1), gj,i ∈ R[X,Y ], and di is the degree of the highest-order homogeneous term in fi. We want these leading terms g0,i to be nonzero. The set

s A := {x ∈ T | Πi=1g0,i(x) = 0} has dimension < m, as it is the zeroset of finitely many polynomials. We can construct a c.a.d. of Rm so that A is a union of cells of this c.a.d., and where each ki cells is the image of an open hypercube (0, 1) , ki < m, by some semialgebraic homeomorphisms φ, (the Nash mappings from the c.a.d. construction). We consider the pullback of A by φ:

∗ k n φ (A) = {(x, u) ∈ (0, 1) × R | φ(x, u) ∈ A}.

Here we can apply the inductive assumption, since φ∗(A) ⊂ Rk ⊂ Rm, and we only need apply the projection onto k of the coordinates. That is, we need only the projection Rk+n → Rk, and (k, n) < (m, n) via the lexicographical order for k < m. Hence, we can study the restriction of f to S\(A × Rn). Consider the change of variables

(X1,...,Xm,Y1,...,Yn) 7→ (X1,...,Xm,Y1,...,Yn−1,Z),

s n where Z := YnΠi=1g0,i(X). This induces a homeomorphism from S\(A × R ) onto a bounded semialgebraic subset, say S0 ⊂ Rm+n. (Indeed, the inverse s is defined by Z 7→ Z/Πi=1g0,i(X), where the denominator is known to be n 0 n nonzero on S\(A × R ).) Denote by Sj the images of the Sj\(A × R ) ⊂ S 0 0 via this homeomorphism. Letting Y = (Y1,...,Yn−1), the set S and the sub- 0 sets Sj are described by boolean combinations of finitely many equations and 0 inequalities of polynomials, say h1, . . . , ht0 ∈ R[X,Y ,Z] monic with respect to Z, and ht0+1, . . . , ht independent of Z. We then consider the projection

125 m+n m+n−1 0 πm+n : R → R onto the first m + n − 1 coordinates, (X,Y ). Con- m+n structing a c.a.d. of R adapted to h1, . . . , ht, (as in Theorem 3.4.7), we m+n−1 obtain a finite semialgebraic partition Cn+m−1 of R , and finitely many continuous semialgebraic functions ξC,i : C → R over each cell C ∈ Cm+n−1 0 describing the roots of the h1, . . . , ht over each of these cells. Hence, πm+n S is a union of such cells C. Furthermore, applying Theorem 3.6.1, there exists a finite simplicial complex L in Rm+n−1, and a semialgebraic homeomorphism 0 0 Ψ: |L| → πm+n S , such that every C ⊂ πm+n S is the union of images of open simplices of L by Ψ. Now, take a simplex τ of L such that Ψ(τ) ⊂ C, and a 0 function of the c.a.d., ξC,i : C → R, whose graph is contained in S or in the limit of a cell contained in S0. The semialgebraic homeomorphism Ψ : τ → Rm+n−1 induces a homeomorphism from τ onto its image, Ψ(τ), and the bounded, con- 0 tinuous, semialgebraic functions ξC,i describe the roots of f(X,Y ,Z) over each ◦ Ψ(τ), and they extend continuously to the closure, Ψ(τ), by Lemma 3.4.5. Note that S0 is the union of graphs of these extensions, and bands of the cylinders Ψ(τ) bounded by these graphs. Now, using the full power of the triangula- m+n tion theorem, there exists a finite simplicial complex, K = (σi)i=1,...,p in R such that the projection of each simplex in K, πm+n(σ), is a simplex of the barycentric subdivision L0 of L, and there exists a semialgebraic homeomor- 0 0 0 phism Φ : |K| → S such that πm+n ◦ Φ = Ψ ◦ πm+n|K, and the sets S and Sj ◦ are the unions of images of open simplices, Φ(σ), σ ∈ K.

Figure 40: The semialgebraic homeomorphisms Φ and Ψ map |K| to S0, and |L| 0 to πm+n(S ), respectively. The diagram is commutative.

126 It is enough to prove the theorem for the projection of S0 onto Rm, with S0 and 0 the Sj as semialgebraic subsets. Hence, we apply the inductive assumption to ◦ 0 m+n−1 m πm+n(S) with subsets Ψ(τ), and the projection mapping π : R → R . m We obtain a finite partition of R into semialgebraic sets, say T1,...,Tl, and 0 ◦ semialgebraic trivializations ρi of π over each Ti compatible with the Ψ(τ).

Figure 41: By the inductive assumption, we have a semialgebraic trivialization 0 0 ◦ of π over πm+n(S ) compatible with the subsets Ψ(τ).

(1) Fix k ∈ {1, . . . , l} and let x be a point in Tk. We may assume that Gk = 0−1 (1) (1) (1) 0  (1) 0 (1) 0 π (x )∩πm+n(S), and ρk x , (x , y ) = (x , y ) for each (x , y ) ∈ Gk. Then, denoting p : Rm+n → Rm the projection onto the first m coordinates, set −1 (1) 0 Fk :=p (x ) ∩ S −1 (1) 0 Fk,j :=p (x ) ∩ Sj. −1 0 It only remains to construct Θk : Tk × Fk → p (Tk) ∩ S .

0 Figure 42: We aim to construct Θk : Tk × Fk → S .

127 (1) 0 (1) 0 For x ∈ Tk and (x , y ) ∈ Gk, the point (x , y ) belongs to one of the ◦ images of the open simplices τ ∈ L by the triangulation Ψ, say Ψ(τ 1), and (1) 0  ◦ ◦ ρk x, (x , y ) ∈ Ψ(τ 1). The Φ and Ψ are such that the Φ(σ) induce home- −1 (1) 0 0 −1 (1) 0  0 omorphic subdivisions of πm+n(x , y ) ∩ S and πm+n ρk x, (x , y ) ∩ S . The function Θk maps each nonempty segment

 −1 (1) 0  {x} × πm+n(x , y ) ∩ Φ(σ) affinely onto the corresponding segment

−1  (1) 0  πm+n ρk x, (x , y ) ∩ Φ(σ).

Figure 43: The function Θk maps each nonempty segment over x ∈ Tk (red, dashed) to the corresponding segment in S0.

−1 0 Then Θk(Tk × Fk,j) = p (Tk) ∩ Sj, as required. We are assured that Θk has the required properties since K is constructed in such a way that the projection ◦ is obviously trivial over each of the open simplices σ ∈ K.

We give some immediate applications of Hardt’s theorem concerning the dimension of the fibers of the semialgebraic mapping f : S → Rk. Corollary 4.1.2. Let S ⊂ Rn be a semialgebraic subset, and f : S → Rk a continuous semialgebraic mapping. For a fixed integer d ∈ N, the set

(d) k −1 S := {c ∈ R | dim(f (c)) = d} is a semialgebraic subset of Rk with dimension dim(S(d)) ≤ dim(S) − d.

Proof. Assume (using Hardt’s theorem, 4.1.1) that we have a partition C1,...,Cm k of R such that f is semialgebraically trivial over each Ci. We can set Fi =

128 −1 −1 f (ci) for some ci in each Ci, so that each f (Ci) is semialgebraically home- omorphic to Ci × Fi. For any point c ∈ Ci, we have

−1 −1 −1 dim(f (c)) = dim(f (ci)) = dim(Fi) = dim(f (Ci)) − dim(Ci),

−1 and clearly dim(f (Ci)) ≤ dim(S). Therefore we obtain

−1 dim(f (c)) ≤ dim(S) − dim(Ci).

(d) constant for all c ∈ Ci. This means that the set S is a union of some of the Ci, say

(d) [ S = Ci, for some M ⊂ {1, . . . , m}. i∈M

We know that S is semialgebraically homeomorphic to a (finite) union of Ci × Fi via h, and therefore dim(S) = maxi∈{1,...,m}(dim(Ci × Fi)). Then, since dim(Ci × Fi) = dim(Ci) + dim(Fi) we have

(d) [ dim(S ) = dim( Ci) i∈M  = max dim(Ci × Fi) − d i∈M  ≤ max dim(Ci × Fi) − d i∈{1,...,m} = dim(S) − d.

n+1 n Corollary 4.1.3. Let πn : R → R denote the projection onto the first n coordinates, and let S ⊂ Rn+1 be a semialgebraic subset of dimension n. There exists an integer N ≥ 0 such that

n −1 SN := {x ∈ R | S ∩ πn (x) has exactly N elements} is a semialgebraic set of dimension n, and

n −1 TN := {x ∈ R | S ∩ πn (x) has > N elements} is either a semialgebraic set of dimension < n or empty.

Proof. We can construct a c.a.d. of Rn+1 adapted to the polynomials describing p S. Let S = ∪i=1Ci a finite union of cells Ci of this c.a.d.. Since dim(S) = n, the maximum dimension of these Ci must be maxi∈{1,...,p}(dim(Ci)) = n. The 0 Ci of maximal dimension in S must be either graphs AC0,k over cells C ∈ Cn 0 of the c.a.d. of dimension n, or bands BD0,k over cells D ∈ Cn of dimension n − 1. (Note that we only need to consider the cells Ci that are graphs of the semialgebraic ξ : C0 → R over C0 with dimension n, since the bands contained 0 in S only occur over cells C ∈ Cn of dimension < n. The fiber of πn over points

129 in such cells contain infinitely many points (as they contain open line segments), and therefore the points in these cells are contain in TN .) Due to the cylindrical arrangement, SN is clearly a union of cells of the c.a.d., and is therefore semialgebraic. Furthermore, TN is the complement of S1 ∪...∪SN in S. Therefore TN is also semialgebraic. By Hardt’s semialgebraic n m triviality, there exists a finite semialgebraic partition R = ∪i=1Di such that πn is trivial over each Di, and such that the trivialization is compatible with n+1 −1 the cells Ci ⊂ S of the c.a.d. of R . Letting Fi := πn (di) for some di in each Di, we have dim(Di × Fi) = dim(Di) + dim(Fi). So, if Fi = {a1, . . . , aNi } a union of finitely many points (and thereby having dimension 0), then by Corollary 4.1.2 we have dim(Di) ≤ dim(S) = n. Conversely, if dim(Fi) ≥ 1, n then dim(Di) ≤ n − 1. Since there are finitely many Di partitioning R , (and the same number of corresponding Fi), we simply take the finite Fi with the largest Ni, and define N to be the maximum among these Ni. The union of the corresponding Di is precisely the set SN . Since N is chosen to be maximal among the finite Fi, any Fi containing more than N points must contain infinitely many points, and hence the cor- n responding Di ⊂ R must have dimension < n in order for dim(Di × Fi) = dim(Di) + dim(Fi) ≤ n. This means that TN is a union of such Di, and there- fore dim(TN ) ≤ n − 1 < n. Of course, if no such Di exist then TN is empty. From Hardt’s semialgebraic triviality we are also able to show that semialge- braic sets have a local conic structure. We first introduce the relevant notation. As usual, we denote by B(x, ) the closed ball with radius  > 0 centered at x ∈ Rn, and S(x, ) the sphere of radius  centered at the point x. For a set A ⊂ Rn and a point v ∈ Rn, we denote by v ∗A the cone with base A and vertex v. Precisely, this is the set

n v ∗ A = {x ∈ R | ∃λ ∈ [0, 1], ∃a ∈ A such that x = λv + (1 − λ)a}.

For a point v ∈ S, we consider the intersections S(v, ) ∩ S and B(v, ) ∩ S, and in particular, the cone formed by v ∗ S(v, ) ∩ S.

Theorem 4.1.4. Let S ⊂ Rn semialgebraic, and take a non-isolated point v ∈ S. For sufficiently small  > 0 there exists a semialgebraic homeomorphism

h : B(v, ) ∩ S → v ∗ S(v, ) ∩ S such that ||h(x) − v|| = ||x − v|| and h|S(v,)∩S is the identity map. Proof. Consider the continuous semialgebraic mapping

d : S −→ R x 7−→ ||x − v||.

Applying Hardt’s theorem 4.1.1, we know that there exists a finite partition of R into semialgebraic subsets, C1,...,Cp, such that d is semialgebraically trivial over each Ci. Since there are finitely many Ci, each of which are semialgebraic

130 subset of R, one of the Ci must contain an interval of the form (0, r), for some r > 0. Then, we choose  ∈ (0, r), hence the mapping d is semialgebraically trivial over the interval (0, ). It is clear that d−1() = S(v, )∩S, since S(v, )∩S is precisely the set of points in S at a distance of  away from v.

Figure 44: The intersection of the semialgebraic set S ⊂ Rn with S(v, ) about a non-isolated point v ∈ S.

Setting this S(v, ) ∩ S as the fiber Fi of d over (0, r), there is a semialgebraic homeomorphism g : d−1 ((0, r)) −→ (0, r) × (S(v, ) ∩ S)

x 7−→ (||x − v||, g1(x)), such that the restriction of g1 to S(v, ), ∩S is the identity.

Figure 45: Hardt’s semialgebraic triviality gives us the semialgebraic homeo- morphism g : (0, r) → S(v, ) ∩ S, where the fiber is chosen to be d−1().

131 We are now able to construct the semialgebraic homeomorphism h as required by the theorem. Define

h : B(v, ) ∩ S −→ v ∗ (S(v, ) ∩ S)  ||x − v|| ||x − v|| x 7−→ 1 − v + g (x),   1 where we set h(v) = v. The function h is clearly semialgebraic, continuous, and bijective, since g1 must also be continuous and semialgebraic. Let us denote h(x) = y. Then

 ||x − v|| ||x − v|| ||y − v|| = || 1 − v + g (x) − v||.   1

||x−v|| Recall that g1(x) ∈ S(v, )∩S, and for convenience let us denote λ := 1−  . Then the above equation becomes

||y − v|| = ||λv − v + (1 − λ)g1(x)||

= ||(1 − λ)(g1(x) − v)||

= (1 − λ)||g1(x) − v|| = (1 − λ) = ||x − v||, yielding the first required property of h. We now check that h is a homeomor- phism. Using the above notation, h(x) = (λv + (1 − λ)g1(x)) and  g(x) = ||x − v||, g1(x)

= ((1 − λ), g1(x))   ||y−v||  y − 1 −  = ||y − v||, ,  ||y−v||   where y, v, and  are known. Therefore

−1  −1  −1  g ||x − v||, g1(x) = g (1 − λ), g1(x) = h λv + (1 − λ)g1(x) , and h−1(v) = v. Remark: We assume v to be a non-isolated point of S since, for sufficiently small  > 0 and isolated points v0 ∈ S, the intersections B(v0, )∩S and S(v0, )∩ S contain only the point v0.

Proposition 4.1.5. Let S ⊂ Rn be semialgebraic. There exists an r > 0 and a continuous semialgebraic mapping

n  h1 : S ∩ R \B(0, r) −→ (S ∩ S(0, r))

132 such that h1(x) = x for all c ∈ S ∩ S(0, r), and n  h : S ∩ (R \B(0, r)) −→ S ∩ S(0, r) × [r, +∞)  x 7−→ h1(x), ||x|| is a homeomorphism. Proof. We denote by d the norm function, d : S −→ [0, ∞) x 7−→ ||x||, which is a continuous semialgebraic mapping. Applying Hardt’s theorem 4.1.1 m to the mapping d, we obtain a finite semialgebraic partition [0, ∞) = ∪i=1Ci such that d is trivialized over each Ci ⊂ [0, ∞) by some semialgebraic homeo- morphism, say gi. Furthermore, each of these gi is compatible with any semi- −1 algebraic subset of S. We have d (Ci) ∩ S semialgebraically homeomorphic to −1 Ci × Fi via gi. Observe also that d (c) ∩ S = S(0, c) ∩ S, for any c ∈ (0, ∞). Since there are finitely many semialgebraic Ci partitioning [0, ∞), one of the Ci, 0 0 say C1, must contain an interval of the form (r , ∞) such that (r − , ∞) * C1 for any  > 0. Then, we set r + 1, and ask that the gi be compatible with the n  n  sets S ∩S(0, r), S ∩ R \B(0, r) , and S ∩ R \B(0, r) , as semialgebraic subset of S. We consider the semialgebraic homeomorphism −1 g1 : d (C1) ∩ S −→ C1 × F1 −1 given by Hardt’s triviality, assuming that F1 = d (r) ∩ S, and hence that −1 g1(x) = (r, x) for all x ∈ S ∩ S(0, r). Restricting d to [r, ∞), it is clear that n   S ∩ R \B(0, r) is semialgebraically homeomorphic to [r, ∞)× S ∩S(0, r) via n  g1. (We will essentially be defining h as the restriction of g1 to S ∩ R \B(0, r) by restricting d−1 to [r, ∞).) To briefly recap the above construction, we take the component of the target space which contains an interval of the form [r0, ∞), and consider the semial- gebraic trivialization of the set that is mapped to this component. We take some r ∈ [r0, ∞) to obtain the representative fiber over this component, say −1 F1 := d (r) ∩ S. Finally, we restrict our mapping to the preimage of the n  smaller interval, [r, ∞), so that S ∩ R \B(0, r) maps cleanly onto [r, ∞) × F1. n  −1 Denote by h1 : S ∩ R \B(0, r) → d (r) ∩ S the composition of the projection that forgets the first coordinate, with the mapping g1 restricted to n  n S ∩ R \B(0, r) . That is, h1 = π1 ◦ g1|S∩(R \B(0,r)), where π1 :[r, ∞) × S ∩ S(0, r) → S ∩ S(0, r), so that n  −1 h1 = π1 ◦ g1 : S ∩ R \B(0, r) −→ d (r) ∩ S x 7−→ g1(x) Finally, set n   h : S ∩ R \B(0, r) −→ [r, ∞) × S ∩ S(0, r)  x 7−→ ||x||, h1(x) ,

133 −1 −1 whose inverse is simply given by h (||x||, h1(x)) = g1 (||x||, h1(x)). Using the construction in Proposition 4.1.5 above, we can construct a semi- algebraic deformation retraction on S, say H : S × [0, 1] → S. That is, a semialgebraic mapping H such that H(x, 0) = x and H(x, 1) ∈ S ∩ B(0, r) for all x ∈ S, and H(x, t) = x for all t ∈ [0, 1] when x ∈ S ∩ B(0, r). Indeed, using n   the homeomorphism h : S ∩ R \B(0, r) → [r, ∞)× S ∩S(0, r) in Proposition 4.1.5, we define H by ( h−1h (x), tr + (1 − t)||x|| ∀x ∈ S ∩ n\B(0, r) H(x, t) := 1 R x ∀x ∈ S ∩ B(0, r), which has the properties required to be a deformation retraction on S. We consider the number of semialgebraic topological types of algebraic sub- sets of Rn, in light of Hardt’s semialgebraic triviality. Particularly, we consider the number of topological types of algebraic subsets of Rn defined by polyno- mials of bounded degree, d. Subsets U, V ⊂ Rn are said so have the same semialgebraic topological type of inclusion if there exists a semialgebraic home- omorphism h : Rn → Rn such that h(U) = V . For instance, any two spheres in Rn are semialgebraically homeomorphic to one another, and therefore have the same semialgebraic topological type. However, a sphere and the union of 2 distinct spheres in Rn have distinct (semialgebraic) topological types. We first prove the following lemma for convenience. d + n Lemma 4.1.6. A polynomial of degree d in n variables has coeffi- n cients. Proof. We proceed by induction on n. Checking the n = 1 case: a polynomial d P ∈ R[X] of degree at most d is of the form P (X) = a0X + ... + ad, which d + 1 has = d + 1 coefficients as required. Now let n ≥ 1, and assume 1 d + n that a polynomial of degree d in n variables has coefficients. Then, n including the new variable, say Y , gives rise to more possible monomial terms in a polynomial P ∈ R[X1,...,Xn,Y ] of degree at most d. More precisely, for d + n each of the monomial terms in a polynomial of degree d in n variables, n a number of new terms (corresponding to new multi-degrees) arise based on the total degree of this monomial term. For a monomial term Xα of degree k ≤ d in n variables, including the variable Y then allows for the formation of d − k new possible terms, XαY,...,XαY d−k. Hence, by counting the number of monomial terms of each degree in a polynomial of degree d in n variables, we are able to deduce the number of new coefficients generated by including the (n+1)-th variable, Y . By the inductive assumption, a polynomial in n variables k + n k + n − 1 of degree d has − terms of degree k, where 1 ≤ k ≤ n, n n

134 (and a single term of degree 0), and the inclusion of the variable Y gives rise to d − k new terms for each of these. Hence, we count d − k + 1 coefficients in a polynomial of degree d in n + 1 variables for each monomial term of degree k ≤ d in a polynomial of degree d in n variables. We compute d + n d + n − 1 d + n − 1 d + n − 2 1 − + 2 − + ... n n n n i + n i + n − 1 n + 1 n + (d − i + 1) − + ... + d + (d + 1) n n n n d + n d + n − 1 i + n n + 1 n = + + ... + + ... + + n n n n n d+n X j = n j=n d + n + 1 = , n + 1 where the last equality is the “hockey stick identity”, which is proven by induc- tion, as follows. For the r = 1 case, we have r X i r r + 1 = = 1 = . r r r + 1 i=r Now assume that k X i k + 1 = r r + 1 i=r for k ∈ N, k ≥ r. Then k+1 k X i X i k + 1 k + 1 k + 1 = + = + r r r r + 1 r i=r i=r (k + 1!) (k + 1)! = + (r + 1)!(k − r)! r!(k − r + 1)! k − r + 1 (k + 1)!  r + 1  (k + 1)! = + k − r + 1 (r + 1!)(k − r)! k − r + 1 r!(r + 1)(k − r)! k − r + 1 (k + 1)!  r + 1  (k + 1)! = + k − r + 1 (r + 1)!(k − r)! k − r + 1 (r + 1)!(k − r)! (k − r + 1) + (r + 1)  (k + 1)!  = k − r + 1 (r + 1)!(k − r)! (k + 2)(k + 1)! (k + 2)! k + 2 = = = . (r + 1)!(k − r)!(k − r + 1) (r + 1)!(k − r + 1)! r + 1 d + n + 1 That is, a polynomial of degree d in n+1 variables has coefficients, n + 1 hence concluding the proof.

135 Theorem 4.1.7. For arbitrary positive integers d and n, there exists a positive integer p equal to the total number of semialgebraic topological types of algebraic subsets of Rn which are defined by polynomial equations of degree at most d. Proof. Let V ⊂ Rn be an algebraic subset defined by polynomial equations P1 = 0,...,Pq = 0, where P1,...,Pq have degree ≤ d. We can rewrite this 2 2 system of equations as P1 + ... + Pq = 0, so that V is defined by a single polynomial equation of degree at most 2d. By Lemma 4.1.6, a polynomial 2d + n of degree 2d in n variables has N := coefficients. We identify the n coefficients of such a polynomial with RN via lexicographical ordering on its monomial terms. For a point c ∈ RN , we denote the corresponding polynomial N n by Pc ∈ R[X1,...,Xn]. Now, consider the subset of R × R defined by

N n A := {(c, x), ∈ R × R | Pc(x) = 0}.

Since Pc(X) is a polynomial in c and X1,...,Xn, the set A is in fact algebraic. Therefore, we can apply Hardt’s triviality 4.1.1 to π : RN × Rn → RN , the projection onto the first N coordinates. Hence we obtain a finite semialgebraic N n partition R = C1 ∪ ... ∪ Cr, and semialgebraic trivializations hi : Ci × R → n Ci × R of π over each Ci, compatible with A as a semialgebraic subset of n R . We now choose a ‘representative point’ in each of the Ci. Particularly, we choose a ci ∈ Ci such that Pci is a sum of squares of polynomials for each Ci where this is possible. Let {C1,...,Cp} ⊂ C1,...,Cr denote the collection of sets in which there exists such a ci. Then, define V1,...,Vp as the zerosets n of the polynomials Pc1 ,...,Pcp , respectively. Now, any algebraic set W ⊂ R defined by polynomials of degree at most d is of the form W = Z(Pc) for some point c in one of the Ci, i ∈ {1 . . . , p}. Indeed, if c ∈ Ci, then the −1 −1 fibers A ∩ π (c) and A ∩ π (ci) are semialgebraically homeomorphic via the trivialization hi, remembering that each of the trivializations are compatible with A as a semialgebraic subset of RN × Rn. These fibers are precisely W and N n N n Vi, respectively. Hence the semialgebraic homeomorphism R ×R → R ×R given by Hardt’s triviality induces a semialgebraic homeomorphism h : Rn → Rn such that h(W ) = Vi. Since the choice of d, n, and W is arbitrary, the theorem is proven. Remark: In Theorem 4.1.7, we consider the topological types of the zerosets of polynomial mappings Rn → R, of which there are finitely many. The same does not always hold for general polynomial mappings Rn → Rm of fixed degree. The reader may refer to texts such as [10], [11], [12], [13] for further investigation.

4.2 Semialgebraic Sard’s theorem and upper bounds on connected components We now consider an important result concerning the dimension of the set of critical values of smooth semialgebraic mappings - the semialgebraic version of Sard’s theorem. We will make use of this theorem in the following subsection

136 in which we examine bounds on the number of connected component of semi- algebraic sets. In order to prove Sard’s theorem, we invoke several results from [9] (), as well as the Triangulation theorem, 3.6.1, and Hardt’s semialgebraic triviality, 4.1.1.

Lemma 4.2.1. Let M ⊂ Rn be a Nash submanifold of dimension d. There p exists a finite semialgebraic open cover M = ∪i=1Mi such that, for each Mi there are 1 ≤ j1 < . . . < jd ≤ n such that the restriction of the projection

π :(x1, . . . , xn) 7→ (xj1 , . . . , xjd ) to Mi is a Nash diffeomorphism onto its image. Proof. The full proof can be found in[9], Corollary 9.3.10, page 227.

Theorem 4.2.2 (Constant Rank Theorem). Let A ⊂ Rn be an open semialge- m braic set, and f : A → R a Nash mapping such that the rank of dxf is equal to d at every x ∈ A. For any point a ∈ A, there exist Nash diffeomorphisms

u :(−1, 1)n → U, v :(−1, 1)m → V, where U ⊂ A is an open semialgebraic neighborhood of a, and V ⊂ Rm is an open semialgebraic neighborhood of f(a) containing the image f(U), such that v ◦ f|U ◦ u is the projection mapping (x1, . . . , xn) 7→ (x1, . . . , xd, 0,..., 0). Proof. The proof relies on the inverse function theorem for semialgebraic C∞ functions.

Lemma 4.2.3. Let g : (0, 1)d → Rm be a Nash mapping such that the rank of d d dxg is < m for all x ∈ (0, 1) . Then the dimension of the image g (0, 1) is also < m. Proof. The full proof can be found in [9], Lemma 9.6.3, page 235, and relies on several results, including the Constant Rank Theorem 4.2.2 above, (both of these are also found in [9]).

Theorem 4.2.4 (Semialgebraic Sard’s Theorem). Let M ⊂ Rm and N ⊂ Rn be smooth semialgebraic submanifolds, and let f : M → N be a semialgebraic C∞ mapping. Then the set of critical values of f is a semialgebraic subset of N, and has dimension < dim(N). Proof. By Lemma 4.2.1, we may assume that M and N are open semialgebraic subsets of Rm and Rn, respectively, since we can restrict our study to the image of any of the open semialgebraic sets in the finite covers by the corresponding projections. Let S ⊂ M be the set of critical points of f. Since the partial derivatives of f are Nash functions, S is indeed a semialgebraic subset. Using Theorem 3.4.9, we know that S is the finite union of semialgebraic sets Si ⊂ di S, where each Si is the image of an open hypercube (0, 1) by some Nash di diffeomorphism φi : (0, 1) → M. The composite function f ◦ φi is defined on ∗ φi (S), the pullback of S by φi. By definition of S, dxf is not surjective at all x ∈ S, and hence dx(f ◦ φi) is not surjective, meaning that is has rank less than

137 dim(N) = n. Then, by Lemma 4.2.3, the dimension of the image of f ◦ φi is  also less than m. That is, dim (f ◦ φi)(S) < n, as required.

Example 32. Let M = R2 and N = [0, ∞)×R the closed positive x half-plane, and consider the C∞ semialgebraic mapping f : M → N defined by f :(x, y) 7−→ x2, (y − x)3. We compute df, the differential of f:  2x 0  df = , (x,y) −3(y − x)2 3(y − x)2 whose determinant is 6x(y − x)2, which is equal to zero when either x = 0 or y = x. This defines the set of critical points of f to be the set {(x, y) ⊂ N | x = 0 or y = x} with dimension equal to 1. Indeed, 1 < dim(N) = 2, as expected due to Theorem 4.2.4. We offer another proof of Theorem 4.2.4 in the case when f : M → R for some smooth hypersurface M ⊂ Rm. Proposition 4.2.5. Let U ⊂ Rm be an open semialgebraic subset, let f and Q both be polynomials Rn → R, and assume that for every x ∈ M := U ∩ Q−1(0), grad(Q(x)) 6= 0 so that M is a smooth hypersurface. Then the set of critical values of f|M is finite.

Proof. Since M is a smooth hypersurface, it identifies locally with Rn−1. We consider Γf|M , the graph of f over M. Since Q(x) = 0 for every x ∈ M, and grad(Q(x)) is assumed to be nonzero on M, grad(Q(x)) must be orthog- n onal to M. We equate grad(f|M ) with π(grad(f)), where π : R → TxM is the orthogonal projection. Hence x is a critical point of f|M if and only if  grad(f|M (x)) = π grad(f(x)) = 0. That is, x is a critical point of f|M if and n−1 only if grad(f(x)) is perpendicular to TxM ' R . Let us denote by Z the set of critical point of f|M . Then we have

Z ={x ∈ M | x is a critical point of f|M } n {x ∈ R | ∃λ ∈ R, grad(f(x)) = λgrad(Q(x))} ∩ M.

That is, the set of critical points of f|M is the set of points at which grad(f) is colinear to grad(Q). It is clear that Z is semialgebraic, by the second equality in the above expression. Now, we take δ :(−1, 1) → Z a smooth curve such that δ(0) = x for some x ∈ Z, and let us denote W := f(M). Since δ is smooth, δ0 exists and is continuous everywhere on (−1, 1). We consider the tangent mapping 0 0 dfx : TxM → Tf(x)W , where we take dfx(δ (0)) = (f ◦δ )(0), by definition. For a 0 point x ∈ Z, grad(f(x)) is perpendicular to TxM ⊃ TxZ, hence dfx(δ (0)) = 0, which is equivalent to (f ◦ δ0)(0) = 0, implying that f is constant along any smooth path in Z. Therefore, f is constant on each connected component of Z. Since Z is semialgebraic, it has finitely many connected components, hence f has finitely many critical values.

138 We aim to produce some explicit bounds on the possible number of compo- nents of semialgebraic sets described by a certain number of polynomial equa- tions and inequalities. Following Theorem 4.1.7, we can consider the topological types V1,...,Vp of algebraic sets defined by polynomial equations of degree at most d in n variables, taking N, the number of connected components in the Vi with the largest number of connected components, as a bound. We aim to give a hard theoretical bound for this number. Our approach first involves reducing to the setting of a compact, smooth hypersurface by a method given in [1]. From there, we show how to form a bound on the possible number of connected components based on the number of polynomial equations and inequalities (as well as their degrees) defining a semialgebraic set. n Consider an algebraic subset V ⊂ R defined by polynomial equations P1 = ... = Pq = 0 of degree at most d, and that V has at least two connected 2 2 components. Denoting P := P1 + ... + Pq , we replace this system of equations with the single polynomial equation P = 0. Then, we choose R > 0 large enough that B(R), the ball of radius R centered at the origin, intersects every connected component of V . It is obvious that the number of connected components of B(R) ∩ V is equal to or greater than the number connected components of V itself. Denote by F the (finite) collection of connected components of B(R)∩V . For each C ∈ F, define  KC := {x ∈ B(R) | dist(x, C) = dist x, (B(R) ∩ V )\C }, the set of points in B(R) of equal distance from C and every other connected component of B(R) ∩ V . Denote K := ∪C∈F C, which is a closed semialgebraic subset of B(R), and is disjoint from V itself.

Figure 46: The ball B(R) intersects every connected component of an alge- braic subset V ⊂ Rn. The set K is equidistant from the two closest connected components in B(R) ∩ V .

139 It is obvious that any continuous path from one connected component C1 ∈ F to another, C2 ∈ F, must intersect KC1 (and KC2 ). Therefore, each connected component of B(R)\K must contain at most one of the C ∈ F. Now, for some  > 0, we define the function

2 2 Q(x) = P (x) +  ||x|| − R , which is a degree 2d polynomial in x. Furthermore, since K is a compact semialgebraic subset of Rn, the function P (X)/R2 attains some minimum, say 0 on K. Note that 0 > 0 since P is a sum of squares, and P 6= 0 on K, hence P > 0 on K. Thus, assuming that 0 <  < 0, the zeroset W of Q must 2 2 n be contained in B(R). Indeed,  ||x|| − R > 0 for all x ∈ R \B(R). Now, consider a connected component D of B(R)\K containing one of the C ∈ F. The polynomial Q is non-positive on C, and is positive on K. Therefore W, the zeroset of Q, must have a nonempty intersection with D.

Figure 47: The zeroset of Q1 (left), Q2 (middle), and Q3 (right), for positive values 1 < 2 < .

Hence W has at least as many connected components as B(R)∩V , and therefore at least as many as V . We now choose  > 0 so that W is a smooth hypersurface. That is, we wish to choose  such that, for every x ∈ W, the partial derivatives ∂Q/∂xi do not all vanish simultaneously. Let

n X := {(x, ) ∈ R × R |  > 0 and Q(x) = 0}.

2 2 Now, since Q(x) = P (x) + (||x|| − R ), we have ∂Q  (x) = ||x||2 − R2, ∂ which vanishes only if ||x|| = R. Furthermore, for any x ∈ X

∂Q  (x) = ||x||2 − R2 = 0 =⇒ Q (x) = P (x) = 0. ∂ 

140 That is, ∂Q/∂(x) = 0 if and only if ||x|| = R and P (x) = 0. Since P is a sum of squares, the partials ∂P/∂xi, i = 1, . . . , n, vanish whenever P = 0. Therefore

∂Q ∂P (x, ) = + 2xi = 2xi, ∂xi ∂xi and for ||x|| = R > 0, at least one of the xi must be nonzero. (In fact, only one of the xi can be zero at a time for ||x|| = R.) Hence at least one of the ∂Q/∂xi(x, ) must be nonzero for all (x, ) ∈ X. That is, X is a smooth hypersurface in Rn+1. Now, consider the function

f : X −→ R (x, ) 7−→ .

By semialgebraic Sard’s theorem, 4.2.4, f has finitely many critical values. Therefore we can easily choose 0 <  < 0 such that  is not a critical value of f. That is, we can choose  > 0 such that grad(Q) = (∂Q/∂x1, . . . , ∂Q/∂xn) is never equal to (0,..., 0) on W, and the corresponding W is therefore a compact smooth hypersurface. With the above arguments, we are able to give a bound for the maximum number of connected components of an algebraic set by essentially counting the critical points of Q. This result (Lemma 4.2.8) requires the use of Bezout’s theorem on the number of nondegenerate solutions of a particular system of equations, and Proposition 4.2.7 to ensure the conditions of Bezout’s theorem can be met. We assume the following setup. Let Q ∈ R[X1,...,Xn] be a polynomial of degree 2d, and assume that W := Q−1(0) is a compact smooth hypersurface, (that is, grad(Q) does not vanish on W ). We define the following system of equations:

Q(x) = 0   ∂Q = 0  ∂x1 (S) .  .   ∂Q = 0. ∂xn−1

A solution x of (S) is said to be ‘nondegenerate’ if the jacobian determinant

 ∂Q ... ∂Q  ∂x1 ∂xn      ∂Q ... ∂Q   ∂x1∂x1 ∂x1∂xn    det    . .   . .. .   . . .      ∂Q ... ∂Q ∂xn−1∂x1 ∂xn−1∂xn is nonzero.

141 Lemma 4.2.6 (Bezout’s Theorem). Let P1,...,Pn be homogeneous polynomials in n variables over an algebraically closed field of degree d1, . . . , dn, respectively. Then the number of nondegenerate solutions of P1 = ... = Pn = 0 is at most d1 . . . dn. For a proof of Bezout’s theorem refer to [9], page 281.

Proposition 4.2.7. We can choose the coordinates of Rn such that all real solutions of (S) are nondegenerate. Proof. We first define the map grad(Q) φ := : W → Sn−1 ||grad(Q)|| from W , the zeroset of Q, to the n − 1-dimensional unit sphere in Rn. Applying Sard’s theorem 4.2.4 to the map φ, we know that the set of critical values of φ has dimension less than n − 1. Therefore, we can find a set of antipodal n−1 points in S , say a1 and a2 = −a1, which are not critical values of φ. Hence, after an appropriate rotation, r, of the coordinate axes, we can assume that the points a1 = (0,..., 0, 1) and a2 = (0,..., 0, −1) are not critical values of φ. Note that the (real) solutions c of the system (S) are precisely the points in W that are mapped to (0,..., 0, ±1) by φ, (or, in fact, they are the c ∈ W such that φ(c) = r−1(0,..., 0, ±1).) Indeed, Q(x) = 0 for all x ∈ W by definition, and grad(Q) is perpendicular to the tangent plane TxW at points x for which xn reaches a stationary value. Therefore the tangent hyperplanes TcW and n−1 T(φ(c)S are both the xn = 0 plane, meaning that ∂Q∂xi(c) = 0 for all i = 1, . . . , n − 1. We consider the matrix of dφ at such a point c ∈ W : 1  ∂2Q  dφc = (c) . ||grad(Q(c))|| ∂xi∂xj i,j=1,...,n−1 Furthermore, the jacobian determinant of (S) at c ∈ W is

 ∂Q (c) ... ∂Q (c)  ∂x1 ∂xn      ∂Q (c) ... ∂Q (c)   ∂x1∂x1 ∂x1∂xn    det   ,  . .   . .. .   . . .      ∂Q (c) ... ∂Q (c) ∂xn−1∂x1 ∂xn−1∂xn where the only nonzero entry in the top row is ∂Q/∂xn(c). Therefore ∂Q  ∂2Q  det[J(S)] = (c) × det ∂xn ∂xi∂xj i,j=1,...,n−1 ∂Q = (c) × ||grad(Q(c))|| det(dφc), ∂xn

142  ∂2Q  n−1 where det ∂x ∂x 6= 0 since TcW and Tφ(c)S have the same i j i,j=1,...,n−1  dimension. We know that ∂Q/∂xn(c) 6= 0, hence det[J (S) ] 6= 0, meaning that the solutions of (S) are nondegenerate. Lemma 4.2.8. The compact smooth hypersurface W = Q−1(0) as defined above has at most d(2d − 1)n−1 connected components. Proof. Since W is a compact smooth hypersurface, each connected component of W is a compact smooth hypersurface, the coordinate function xn attains a maximum and minimum on each connected component of W . The points x ∈ W where xn reaches a minimum or maximum are the points at which the tangent plane TxW is parallel to the xn = 0 plane, and since grad(Q) is perpendicular to the tangent plane TxW , these are also the points at which the ∂Q/∂xi vanish for all i = 1, . . . , n−1. Note that the stationary points of xn are therefore solutions of the system (S). By Proposition 4.2.7, we can choose the coordinates of Rn so that all solutions of (S) are nondegenerate. Then, by Bezout’s theorem (Lemma 4.2.6), the number of nondegenerate solutions of (S) is equal to or less than 2d(2d−1)n−1, the product of the degrees of the equations in (S). However, each connected component of W contains at least one maximum and minimum of the coordinate function xn. Therefore the number of nondegenerate solutions of (S) is at least twice the number of connected components of W . That is, the number of connected components of W is at most d(2d − 1)n−1. Combining the previous results of this section, we conclude our study on the number of connected components of algebraic subsets with the following theorem. Theorem 4.2.9. The number of connected components of an algebraic subset n V ⊂ R defined by polynomial equations P1 = ... = Pq = 0 of degree at most d is equal to or less than d(2d − 1)n−1.

2 2 Proof. Set P := P1 +...+Pq , and take Q(x) as constructed above, and choose  > 0 such that W, the zeroset of Q, is a compact smooth hypersurface in n R . By construction, W has at least as many connected components as V . The polynomial Q has degree 2d. By Lemma 4.2.8, the number of connected n−1 components of W is at most d(2d − 1) . Therefore the number of connected components of V is at most d(2d − 1)n−1.

Example 33. Let P = X2 − Y 2 − 1, and denote by

2 2 2 V := {(x, y) ∈ R | x − y − 1 = 0}, the zeroset of P . The open ball B(2) of radius 2 centered at the origin in R2 intersects both connected components of V . Let

2 p 2 2 2 C1 :={(x, y) ∈ R | x = − y + 1 and x + y ≤ 4} 2 p 2 2 2 C2 :={(x, y) ∈ R |x = + y + 1 and x + y ≤ 4}

143 denote the two connected components of V ∩ B(2), and define Ki as the set of points equidistant from Ci and (V ∩ B(2))\Ci, for i = 1, 2. Note that K1 = K2 since V only has two connected components. Define K = K1 ∪ K2. Note also that each connected component of B(2)\K contains one of the C1, C2.

Figure 48: The closed ball B(2) intersects both connected components of V nonemptily. The set K (red, dashed) is the set of points in B(2) that are equidistant from each connected component of V ∩ B(2).

We aim to produce a compact smooth hypersurface in R2 from the (compact) set V ∩ B(2). We define

2 2 2  Q(X,Y ) = P (X,Y ) +  X + Y − 4 , where  > 0. We will choose some  small enough that the zeroset of Q does not intersect K. Particularly, we need to choose  smaller than the minimum value attained by P/4 on the set K. In this case we can check using simple calculations that P/4 attains a minimum of 0 := 1/4 on the set K. Hence we choose some 0 <  < 1/4.

144 Figure 49: The zeroset of Q for values  = 0.05, 0.2, 0.25. Note that the zeroset of Q is contained in B(2) for all  > 0, and is simply V when  = 0.

We also have to check that the zeroset of Q is in fact smooth. We know by Sard’s theorem that there exist only finitely many values of  in the interval (0, 0) for which the zeroset of Q is not smooth. We compute

2 2 2 2  grad(Q(x, y)) = 2x(2x − 2y − 2 + ), −2y(2x − 2y − 2 − ) .

The system of equations

2x(2x2 − 2y2 − 2 + ) = 0 −2y(2x2 − 2y2 − 2 − ) = 0 has no solutions on the zeroset of Q, and hence the zeroset of Q has no critical points, for any 0 <  < 1/4. Now, set  = 0.2 and denote by W the zeroset 2 of Q. Then W is a compact smooth hypersurface in R defined by a single polynomial equation, (x2 − y2 − 1)2 + 0.2(x2 + y2 − 4) = 0, of degree 4. Lemma 4.2.8 asserts that W has at most 4(2 × 4 − 1)2−1 = 28 connected components, and Theorem 4.2.9 similarly asserts that V has no more than 28 connected components. While this is clearly a drastic overestimate of the actual number of connected components these sets have, we will illustrate how these bounds are established. We find the minima and maxima of the coordinate function y on W . The system of equations

Q(x, y) = (x2 − y2 − 1)2 + 0.2(x2 + y2 − 4) = 0 ∂Q (x, y) = 2x(2x2 − 2y2 − 1.8) = 0 ∂x must have at least 4 distinct real solutions, since each connected component of W is a compact smooth hypersurface, and therefore the coordinate function y must attain a maximum and a minimum√ on√ each. Indeed, there are exactly 4 solutions, occurring at (x, y) = (± 2.425, ± 1.525). Hence the upper bound on the number of connected components of V obtained from this step alone is 2.

145 Figure 50: The maxima and minima a1, a2, a3, a4 of the coordinate function y on W . The right panel shows the point a1 very close to the intersection of V with the boundary of B(2).

We are now able to extend this method to treat the more general case of semialgebraic sets, making several adjustments to account for the presence of inequalities as well as equalities in our systems of equations. In the case of an algebraic set, the number of equations didn’t affect the bound on the number of connected components, since we can replace a finite set of equations with a single equation at the expense of doubling the total degree. However, the bound that we will produce on the number of connected components of a semialgebraic set does require the number of equations and inequalities. The reason for this will become apparent in the proof of the following proposition. Proposition 4.2.10. Let (S) be a system of q polynomial equations and inequal- ities in k variables of degree at most d, for some d ≥ 2. The (semialgebraic) set of solutions of (S) in Rn has at most d(2d − 1)k+q−1 connected components. Proof. We first make replacements for inequalities in (S) in such a way that the number of connected components is not decreased. For a polynomial inequality P > 0 of the system (S), we choose some  > 0 so small that there is a point x in each connected component C of the set of solutions of (S) for which P (x) > ). Then, replacing the strict inequality P > 0 with P −  ≥ 0 in (S) does not decrease the number of connected components of the set of solutions of (S). Now, for each polynomial inequality of the form P ≥ 0 in (S), we introduce a new variable Y . Replacing the inequality P ≥ 0 with the equation P − T 2 = 0 does not decrease the number of connected components of the set of solutions of (S). (Indeed, if there exists a nonempty U that is a connected component of the set of solutions of (S) such that P 6= 0 everywhere on U, then this replacement necessarily increases the number of connected components. If there does exist a connected component U and a point u ∈ U such that P (u) = 0, then the replacement may increase or maintain the number connected components.)

146 Thus, replacing all strict inequalities with relaxed inequalities, then replacing all relaxed inequalities with equations as above, we obtain a system of polynomial equations. Furthermore, since there are at most q inequalities to begin with, and we introduce a new variable for each inequality we replace with an equation, we have a system of q equations in at most k + q variables, all of degree at most d. Finally, by Theorem 4.2.9, the set of solutions of (S) has at most d(2d−1)k+q−1 connected components. The bound in the above proposition is quite crude since, for each replacement of an inequality with an equation we are potentially increasing the number of connected components. Example 34. Let P = X2 − 1 and consider the system (S) containing the single inequality P > 0. We aim to replace the strict inequality with a relaxed inequality. The subset of R satisfying (S) is A := (−∞, −1) ∪ (+1, +∞). We wish to find some  > 0 such that P −  ≥ 0 has at least one solution on each of the connected components of A. In fact, since P is unbounded as x → ±∞, we can choose any  > 0. So, we set  = 1, giving the inequality X2 − 2 ≥ 0. Now, we replace this relaxed inequality with the equality X2 − 2 − T 2 = 0, where T is a new independent variable ranging over R.

Figure 51: The set of points (x, t) ∈ R2 on which x2 − 2 − t2 = 0.

We can now compare the various bounds imposed on the number of connected components of the set of solutions to this system according to previous re- sults. Theorem 4.2.9 asserts that the number of connected components cannot be greater than 28. (Note that the final equation, X2 − 2 − T 2 is essentially the same equation as in Example 33, and indeed we have the same bound, as expected.) Proposition 4.2.10 asserts that the original system of a single poly- nomial inequality of degree 2 in a single variable has at most d(2d−1)k+q−1 = 6 connected components. By observation, the set of solutions of both the original and final system have 2 connected components.

147 4.3 Algebraic computation trees and lower bounds on con- nected components In this subsection we conclude our study of bounds on the number of connected components of semialgebraic sets. We introduce the notion of algebraic compu- tation trees - an algorithmic computation model for deciding whether a partic- ular input (an n-tuple of real values) satisfies a system of polynomial equations and inequalities. That is, a model for deciding whether or not a point x ∈ Rn is contained in a particular semialgebraic set defined by boolean combinations of sign conditions on a family of polynomials in R[X1,...,Xn]. We then relate the notion of an algorithm’s cost to the possible number of connected components of the semialgebraic set, by means of an upper bound on the number of connected components based on the algorithm cost, followed by a lower bound on the cost of an algorithm deciding whether a point lies within a semialgebraic set based on the number of connected components of the set. An algebraic computation tree is a tree with one root, many leaves, and whose intermediate vertices have two types - one that gives an instruction, and one that tests a condition. More specifically, for an input x := (x1, . . . , xn) ∈ n R , a vertex v containing an instruction defines a new variable, say Tv, in terms of a single binary operation, +, −, or ×, on the entries x1, . . . , xn, real constants, and the previously defined variables Tu, where the vertex u is some ancestor of v. Intuitively, these vertices are used to build more complicated algebraic expressions in terms of the input, (x1, . . . , xn), one operation at a time. Vertices containing an instruction have a single son and a single father. A vertex containing a test decides whether a statement of the form a = 0, a ≥ 0, or a > 0 is true or false, where a is either one of the x1, . . . , xn or Tv for some ancestor v of the vertex containing the test. Such vertices have a single father, and they have two possible outputs depending on the result of the test, (“yes” or “no”), and hence they each have two sons. By convention, we will draw the branch corresponding to “yes” on the left, and the branch corresponding to “no” on the right. Intuitively, these vertices test the sign conditions on the expressions Tv that have been built up. Finally, the leaves contain a boolean constant (“true” or “false”), indicating whether or not the input satisfies the required polynomial sign conditions - that is, whether or not the point (x1, . . . , xn) is contained in the semialgebraic set. Leaves also have a single father vertex. Example 35. Let P = X2 − Y 2 − 1, Q = XY , and consider the semialgebraic subset of R2 defined by S := {(x, y) ∈ R2 | P (x, y) = 0 and Q(x, y) ≥ 0}. Consider the following algebraic computation tree.

148 Figure 52: Algebraic computation tree deciding where a point (x, y) ∈ S. If the input is a point in S, the algorithm will arrive at either vertex u or v.

The cost of an algorithm in this model is the length of the longest possible n path taken by an input (x1, . . . , xn) ∈ R from the root to a leaf. For instance, the cost of the algorithm in Example 35 above is 9, taking the root vertex Tv1 as the first vertex, and either v or w as the last vertex. We give an upper bound for the number of connected components of a semialgebraic set in relation to the cost of an algorithm deciding whether or not a point is in this set.

Theorem 4.3.1. Let S be a semialgebraic subset of Rn, and consider an algo- rithm deciding whether x ∈ S. If the cost of the algorithm is c, then S has at most 22n+5c connected components. Proof. We construct a system of polynomial equations and inequalities in terms of the xi and Tv, and apply Proposition 4.2.10. Take a leaf v of the com- n putation tree that is labeled “true”, and denote by Wv ⊂ R the set of in- puts for which the algorithm arrives at v. Assume Wv is nonempty, and consider all vertices that are ancestors of v. Each of the ancestors u of v are either an instruction of the form Tu = a ∗ b, where a and b are among

149 0 x1, . . . , xn and the Tu0 for ancestors u of u, or a test a?0, where ? is either =, ≥, or >. For each vertex u containing an instruction we take the equation “Tu = a ∗ b”. For each vertex u containing a test for which the output lead- ing to v is “yes” we take the equation/inequality “au?0”, and for each vertex containing a test for which the output leading to v is “no” we take the nega- tion “¬(au?0)”. We now have a system (Sv) of equations and inequalities in the variables X1,...,Xn, and some Tu and au for ancestors u of v. Let Ns be the number of equations and inequalities in (Sv), and let m denote the num- ber of these that are in terms of some Tu or au, and note that m ≤ Ns ≤ c. n Observe that Wv ⊂ R is the projection of the set of solutions of (Sv) onto n (x1, . . . , xn), the first n coordinates. Indeed, Wv is the set of (x1, . . . , xn) ∈ R for which there are a set of tu and au, say tv1 , . . . , tvk , avk+1 , . . . , avm , such that

(x1, . . . , xn, tv1 , . . . , tvk , . . . , avk+1 , . . . , avm ) is a solution of (Sv). Therefore the number of connected components of Wv is equal to or less than the number of connected components of the set of solutions of (Sv). Now, (Sv) is a sys- tem of m equations in Ns variables. Furthermore, each equation/inequation in (Sv) is polynomial and has degree at most 2 with respect to the variables x1, . . . , xn, tv1 , . . . , tvk , . . . , avk+1 , . . . , avm . Hence, applying Proposition 4.2.10, the number of connected components of the set of solutions of (Sv) is at most n+m+Ns−1 n+2c−1 2×3 . Since m ≤ Ns ≤ c, we can replace this number with 2×3 , and at the cost of tightness of the final bound, we use the fact that 3 < 4 = 22 to replace 2 × 3n+2c−1 with 2 × 22n+4c−2. Now, S is the union of all semialgebraic sets Wv for which the leaf v is labeled “true”. Combining the fact that a vertex can have at most 2 sons, the number of leaves labeled true (and therefore the c number of Wv contained in S) is at most 2 . Therefore the number of connected components of S is at most 2c × 22n+4c−2 = 22n+5c−2 ≤ 22n+5c.

Example 36. We return to the setup in Example 35. By observation the set S has 2 connected components (the portions of the curve in Example 33 which lie in the 1st and 3rd closed quadrants). Applying the bound from Theorem 4.3.1, we find that S cannot have more than 22×2+5×c = 249 connected components. Clearly the bound given in Theorem 4.3.1 is not tight - several factors were introduced to obtain a more succinct form of the final bound. However, this form is desirable for ease of application in the following corollary. We use the ‘big-O’notation, where, for real or complex valued functions f and g, f = O(g) if the magnitude of f(x) approaches some constant multiple of g(x) as x → ∞. Also, if f = O(g), then g = Ω(f). Corollary 4.3.2. The cost of an algebraic computation tree algorithm deciding whether n real numbers are distinct is at least Ω(n log(n)).

n Proof. Let S denote the set of points (x1, . . . , xn) ∈ R such that the xi are all distinct. Then, consider strict orderings xk1 < . . . < xkn of the x1, . . . , xn. It is clear that there are n! such orderings. Hence S has n! connected components. If we denote by c the cost of an algorithm deciding whether a point x ∈ Rn is in S, Theorem 4.3.1 asserts that n! ≤ 22n+5c. Rearranging, we have log(n!) ≤ 2n+5c,

150 and we use n log(n) = O(log(n!)), giving n log(n) ≤ O(2n + 5c). That is, Ω(n log(n)) = c, since 2n + 5c → 5c as c → ∞, which differs from c by a (constant) multiple of 5. We conclude this section with another application, giving a bound on the algorithm cost of the “big hole” problem.

Proposition 4.3.3. For an n-tuple of real numbers, x1, . . . , xn, the algorithm cost of deciding whether there exists a closed interval of length 1 contained in the convex hull of the x1, . . . , xn in R, and containing none of the xi, is at least Ω(n log(n)).

n Proof. Let Pd be the family of all algebraic subsets of R defined by a polynomial equation of degree at most d, and let S ⊂ Rn be an arbitrary semialgebraic subset. Then we denote by Id(S) the smallest number such that the intersection A ∩ S for any A ∈ Pd has at most Id(S) connected components. Let Wn denote the subset of Rn for which there is ‘no big hole’, and for a permutation σ ∈ Sym(n), define

n Wσ(1,...,n) := {(x1, . . . , xn) ∈ R | ∀i ∈ {1, . . . , n}, 0 ≤ xσ(i+1) − xσ(i) ≤ 1}, the subset of Wn for which the coordinates are ordered so that xσ(1) ≤ ... ≤ n xσ(n). Clearly each Wσ is a semialgebraic subset of R . We will show by induction on k that every Wσ(1,...,k) is connected. Let k = 2, so that

2 W(1,2) := {(x1, x2) ∈ R | (x2 − x1) ≤ 1, x1 ≤ x2},

2 and consider the projection π2 : R → R onto the first coordinate. The fiber of π2 over any x1 ∈ R is the segment [(x1, x1), (x1, x1 + 1)], which is clearly connected. By a similar argument, W(2,1) is also connected. Furthermore, both W(1,2) and W(2,1) contain the diagonal x1 = x2, hence the union, W2 = W(1,2) ∪ W(2,1), is a connected semialgebraic set. Now, let k ≥ 2 and assume that W(1,...,k) is a connected semialgebraic set. Consider the set

k+1 W(1,...,k,k+1) := {(x1, . . . , xk+1) ∈ R | (x1, . . . , xk) ∈ W(1,...,k) and xk+1 ≤ xk + 1, xk ≤ xk+1},

k+1 k and denote by πk+1 : R → R the projection onto the first k coordinates. The fiber of πk+1 over each (x1, . . . , xk) ∈ W(1,...,k) = πk+1(W(1,...,k+1)) is the   segment (x1, . . . , xk, xk), (x1, . . . , xk, xk + 1) which is connected, and contains the point (x1, . . . , xk, xk). By the inductive hypothesis on W(1,...,k), the point (x1, . . . , xk, xk) is in the same connected components as the diagonal x1 = ... = xk = xk+1. That is, the fiber of πk+1 over any point in W(1,...,k) contains a point that is in the same connected component of Wk+1 as the diagonal. Hence W(1,...,k+1) is connected. Extending this to sets of the form Wσ(1,...,k+1), k+1 k σ ∈ Sym(k + 1), we replace πk+1 with a projection R → R that forgets the

151 σ(k + 1)-th coordinate. Hence all Wσ(1,...,k+1) are connected semialgebraic sets containing the diagonal in Rk+1, and therefore [ Wk+1 = Wσ(1,...,k+1) σ∈Sym(k+1) is a connected semialgebraic set. This concludes the induction. Now, assume (x1, . . . , xn) ∈ Wn with x1 ≤ ... ≤ xn. By definition of Wn, 2 (xi − xi+1) ≤ 1. Thus for fixed k < n we have

n n−k X 2 X 2 (xk − xj) ≤ j . j=k+1 j=1 Summing over all k < n yields

n−1 n−k  X 2 X X 2 (xi − xj) ≤  j  i 1, which is impossible for points (x1, . . . , xn) ∈ W(1,...,n). P 2 Pn−1 2 Therefore, for a point (x1, . . . , xn) ∈ W(1,...,n), i

n−1 n X 2 X 2 Ln := {(x1, . . . , xn) ∈ R | (xi − xj) = k(n − k) } i

 n Lσ(1,...,n) := (x1, . . . , xn) ∈ R | xσ(i) = xσ(i+1) − 1 for all i ∈ {1, . . . , n − 1} , for each σ ∈ Sym(n). It is obvious that Ln and the Lσ(1,...,n) are algebraic n subset of R . Rewriting the sets Lσ(1,...,n) as n Lσ(1,...,n) = {(x1, . . . , xn) ∈ R | (x − x) = 1} n−1 \ n = {(x1, . . . , xn) ∈ R | (x − x) = 1}, i=1

152 n it is clear that each Lσ(1,...,n) is a line in R . Furthermore, for each permuta- tion σ there are at least two elements i, j ∈ {1, . . . , n} such that xi 6= xσ(i), xj 6= xσ(j). Therefore the sets Lσ(1,...,n) and Lτ(1,...,n) are disjoint for distinct permutations σ, τ ∈ Sym(n). This means that Ln has n! connected components, and therefore I2(Wn) ≥ n!. Applying Theorem 4.3.1, the cost c of an algorithm n 2n+5c deciding whether a point x ∈ R is in Ln is such that n! ≤ 2 . This yields c ≥ Ω(n log(n)). Finally, even though Wn is a connected semialgebraic set (having a single connected component), the description of Wn contains many inequalities, as well as the equalities in the description of Ln. Therefore the cost of an algorithm deciding the big hole problem is at least at great as the cost deciding whether a point is in Ln, which is Ω(n log(n)).

153 5 Implementing the (?) condition

In this section we return to Thom’s lemma 3.4.1, and the modified conditions used in Lemma 3.4.2. While Thom’s lemma requires a family of polynomials P1,...,Ps ∈ R[X] to be closed under derivation, we recall that Lemma 3.4.2 requires a weaker condition:

0 (?) For each i ∈ {1, . . . , s}, the roots of Pi of odd multiplicity are among

the roots of P1,...,Pi−1.

In general, this is a weaker condition than closure under derivation, however it is not as easily implemented. That is, completing a family of polynomials to include all nonzero derivatives is straightforward. However, checking alge- braically whether the (?) condition holds is not as obvious, and completing a family of polynomials to satisfy the condition requires further effort. We give an algorithmic solution to both of these problems.

5.1 Checking the (?) condition The proof of Lemma 3.4.2 is by induction on the number of polynomials in the family, where the inductive assumption is that, for some 1 ≤ i < r, the family (P1,...,Pi) satisfying condition (?) also satisfies the statement of the lemma. The induction step then assumes the same ordering that is naturally required by the (?) condition. We follow this in-built ordering in the following discussion. Given a family of polynomials (P1,...,Ps) ⊂ R[X], and a polynomial Q ∈ R[X], we define Q Rfk(P1,...,Ps)(Q) :=   Y k gcd Q, Pi  i≤s

Y k with k ≤ d, where d is the degree of Q. The term gcd(Q, Pi ) clearly divides i≤s

Q, and therefore Rfk(P1,...,Ps)(Q) is polynomial. If we ask that, for each 0 new polynomial Pi added to the family, all roots of Pi are among the roots 0 of P1,...,Pi−1, (rather than just the roots of Pi of odd multiplicity), then we obtain an algebraic condition.

Proposition 5.1.1. Let (P1,...,Ps) ⊂ R[X] be a finite family of polynomials, and take Q ∈ R[X] with degree d. Then Rfd(P1,...,Ps)(Q) has no real roots if and only if all real roots of Q are among the roots of P1,...,Ps.

d1 dr e1 el Proof. We write Q(X) = (X − a1) ... (X − ar) A1 (X) ...Al (X) so that a1, . . . , ar are the real roots of Q(X) with multiplicities d1, . . . , dr respectively, and the Aj(X) are the irreducible factors of Q which have no real roots. If all real roots a1, . . . , ar of Q are among the roots of P1,...,Ps, then all factors

154 Y (X − a1),..., (X − ar) divide Pi. In fact, the product (X − a1) ... (X − ar) i≤s Y d d Y d divides Pi, and so it is clear that (X − a1) ... (X − ar) divides Pi . If i≤s i≤s d Y d any of the Aj divide any of the Pi, for i ≤ s, then Aj divides Pi . There- i≤s ej Y d ej +1 fore, we know that Aj divides gcd(Q, Pi ), but Aj does not. Then we i≤s

Y d d1 dr can write gcd(Q, Pi ) = (X − a1) ... (X − ar) D(X), where D(X) is i≤s e1 el some divisor of A1 (X) ...Al (X). Therefore, we have Rfd(P1,...,Ps)(Q) = Ae1 (X) ...Ael (X) 1 l , which clearly has no real roots. For the reverse implication, D(X) assume that some (X −ai) does not divide any of the P1,...,Ps. Then (X −ai) Y d Q does not divide gcd(Q, P ). Therefore Rd(P1,...,Ps)(Q) = i f Y d i≤s gcd(Q, Pi ) i≤s di is divisible by (X − ai) , and hence it has at least one real root.

Since Rfd(P1,...,Ps)(Q) is polynomial, we can use Sturm’s method to check whether it has any real roots. We refine this result further. The following construction offers an algebraic condition for checking whether an arbitrary polynomial has real roots of multiplicity ≤ n for any n. Consider a finite family of polynomials (P1,...,Ps), and a polynomial Q = d1 dr e1 el (X − a1) ... (X − ar) A1 (X) ...Al (X) of degree d, where a1, . . . , ar are the real roots of Q, and A1,...,Al are the irreducible factors of Q with no real roots. For convenience we write Rfn(P1,...,Ps)(Q) = Rfn(Q) wherever the omission of the family (P1,...,Ps) from the argument causes no ambiguity. We define

d n+1 e n 2! 2 X  (i) Bn(Q) := Q , i=1 where Q(i) denotes the i-th derivative of Q. We aim to check whether the real roots of Q of multiplicities di ≤ n are among the roots of P1(X),...,Ps(X).

Proposition 5.1.2. For a finite family of polynomials (P1,...,Ps), and a poly- nomial Q, take Rfn(Q) = Rfn(P1,...,Ps)(Q), and take Bn(Q) as defined above. Then Rfn(Q) Bn(Q) has no real roots if and only if all real roots of Q of multiplicity equal to or less than n are among the roots of the polynomials P1,...,Ps.

Proof. Observe that if all real roots ai of Q with multiplicities di ≤ n are among Y n the roots of P1,...,Ps, then Pi is divisible by the product of all such factors i≤s

155 n Y n Y di (X − ai) . Therefore, Pi is clearly divisible by (X − ai) . Hence

i≤s i;di≤n

Q Rfn(Q) =   Y n gcd Q, Pi  i≤s is not divisible by any of the factors (X − ai) whose multiplicity di in Q is not greater than n. That is, Rfn(Q) only has roots ai (with multiplicities, say, gi) whose multiplicities in Q are di > n. Note that gi ≤ di for all i = 1, . . . , r, (in fact, such roots of Q may not be roots of Rfn(Q) at all). We now use the fact (k) that the k-th derivative of Q, denoted Q , does not have roots ai of Q for which di = k. This is true for any factor F of Q with multiplicity k. (We make a careful note that, while such a factor F of Q of multiplicity k is certainly not a factor of Q(k), it is possible that F is still a factor of higher-order derivatives (k+m) Q of Q.) Every real root of Q with multiplicity di ≤ n is therefore not a root of at least one of the derivatives Q(1),...,Q(n) of Q. Taking the sum n X  2 of the squares of these derivatives, Q(i) vanishes if and only if all Q(k), i=1 k = 1, . . . , n, simultaneously vanish. This is not possible for roots of Q with n 2 X  (i) multiplicity di ≤ n, therefore the only roots Q has in common with i=1 Q are those whose multiplicities in Q are greater than n. The multiplicities of n 2 X  (j) each of these roots ai in Q is equal to min 2(di − k) = 2(di − n), k=1,...,n j=1 and so their multiplicities in

d n+1 e n 2! 2 X  (i) Bn(Q) = Q i=1 are equal to n + 1 qi = min 2(di − k)d e ≥ (di − n)(n + 1) ≥ di. k=1,...,n 2

Recall that the multiplicities of such roots ai, as roots of Rfn(Q), are gi ≤ di. One sees that such ai as roots of

Rfn(Q) Rfn(Q) = n+1 B (Q)  d 2 e n Pn (i)2 i=1 Q have multiplicity pi := gi − qi ≤ gi − di ≤ 0, and are therefore not roots of Rfn(Q)/Bn(Q). This proves the “if” part of the statement.

156 We now assume that some real root, ai of Q with multiplicity di ≤ n, is not Y n among the roots of P1,...,Ps. Then (X − ai) is not a divisor of Pi , and so i≤s

Y n di (X − ai) is clearly not a divisor of gcd(Q, Pi ). Therefore, (X − ai) divides i≤s n 2 X  (i) Rfn(Q). We use the fact that Q only has common roots with Q whose i=1 multiplicities are greater than n. Particularly, we see that (X − ai) is not a n 2 X  (i) divisor of Q , and is therefore not a divisor of Bn(Q). Therefore ai is i=1 a root of Rfn(Q) Rfn(Q) = n+1 B (A)  d 2 e n Pn (i)2 i=1 Q of multiplicity di. This concludes the proof. Remark: The term

Rfn(Q) Rfn(Q) = n+1 B (Q)  d 2 e n Pn (i)2 i=1 Q is not polynomial in general, since the roots of Rfn(Q) may have multiplicities equal to or lower than the multiplicities of the same roots in the denominator. This means that we cannot use Sturm’s root counting method to check whether the expression has real roots. However, we can make a small change in order to overcome this problem. We define

Rfn(Q) Rfn(Q) A∗ := = . n  d n+1 e gcd(Rfn(Q),Bn) n 2! 2 X  (i) gcd Rfn(Q), Q  i=1 Or, written in full form:

∗ Rfn(P1,...,Ps)(Q) An(P1,...,Ps)(Q) :=  . gcd Rfn(P1,...,Ps)(Q),Bn(Q) Similarly, we define

∗ Bn Bn := , gcd(Rfn(Q),Bn) and we obtain ∗ An Rfn(Q) ∗ = . Bn Bn

157 ∗ ∗ Clearly An and Bn are both polynomial and have no common factors. Since the d n+1 e n 2! 2 X  (i) roots ai of Q of multiplicity di > n are also roots of Bn = Q i=1 with multiplicity qi ≥ di, it is clear that the ai are also roots of gcd(Rfn(Q),Bn) of multiplicity gi, (the multiplicity of ai in Rfn(Q)). Hence such ai are not roots ∗ ∗ of An, (though they may be roots of Bn). By Proposition 5.1.2 the roots ai of Q of multiplicity di ≤ n are not roots of Rfn(Q) if and only if they are roots of at least one of the polynomials P1,...,Ps, (and the roots of Rfn(Q) are clearly among the roots of Q). As shown in the proof of Proposition 5.1.2, the roots ai of Q of multiplicity di ≤ n are not roots of Bn(Q). We are now able to make the following proposition.

∗ Proposition 5.1.3. Using the above setup, the polynomial An has no roots if and only if all roots ai of Q of multiplicity di ≤ n are among the roots of the ∗ polynomials P1,...,Ps. Furthermore, An has no real roots if and only if all real roots of Q of multiplicity di ≤ n are among the roots of the polynomials P1,...,Ps. Proof. We know that all roots of Q of multiplicity greater than n are also roots ∗ of Bn, and therefore none of these roots are roots of An. We also know that all roots of Q with multiplicity equal to or less than n are not roots of Bn. If all roots of Q of multiplicity di ≤ n are among the roots of P1,...,Ps then ∗ Rfn(Q) contains none of these roots, and so An has no roots. If a root ai of Q of multiplicity di ≤ n is not among the roots of P1,...,Ps then ai is a root of ∗ Rfn(Q), but cannot be a root of Bn. Therefore ai is also a root of An. This ∗ proves the first statement. Since the roots of An are precisely the roots of Q of multiplicity di ≤ n that are not among the roots of P1,...,Ps, the second statement clearly follows. Remark: In the special case that Q has only real, simple roots, Proposition 0 2 5.1.3 can be stated as “Rf1(Q) divides (Q ) if and only if all roots of Q are among the roots of P1,...,Ps”. We want to detect simple roots of our polynomial Q. Particularly, we want to be able to detect simple real roots of a polynomial. Recall from Subsection 1.1 the definition of Gk(Q) for an arbitrary polyno- 0 0 mial Q ∈ R[X]: G1(Q) = gcd(Q, Q ), and Gk(Q) = gcd(Gk−1(Q),Gk−1(Q) ). Recall also the statement of Claim 1.1.4, that the factors, F , of Gk(Q) of mul- tiplicity d are precisely the factors of Q of multiplicity d + k, where k ≤ d. Finally, we also recall the full definition Q Rfn(P1,...,Ps)(Q) =  , Y n gcd Q, Pi  i≤s

158 where we make particular use of the case of n = 1 in relation to Gk(Q). For clarity, we have

Gk(Q) Rf1(P1,...,Ps)(Gk(Q)) =  . Y gcd Gk(Q), Pi i≤s

0 2 Proposition 5.1.4. With the above setup, Rf1(Gk(Q)) divides (Gk(Q) ) if and only if all factors of Q of multiplicity k + 1 are among the factors of P1,...,Ps. Proof. Firstly, the non-simple factors F of Q of multiplicity d ≥ 2 are factors of Q0 with multiplicity d − 1, and are therefore factors of (Q0)2 with multiplicity 2(d − 1) ≥ d. Such factors F of Q are factors of Rf1(Q) with multiplicity at most d (possibly 0, in which case they are not a factor). Hence all non-simple factors F of Q are also factors of (Q0)2 with multiplicity at least as great as 0 2 their multiplicity in Q. That is, the ‘non-simple part’ of Rf1(Q) divides (Q ) . Secondly, the simple factors of Q are not factors of Q0. As in the proof of Proposition 5.1.2, Re1(Q) is not divisible by any of the simple factors F of Q if and only if all such F are among the factors of P1,...,Ps. That is, the ‘simple 0 2 part’ of Re1(Q) divides (Q ) if and only if all simple factors of Q are among the factors of P1,...,Ps. 0 2 Now, replacing Q with Gk(Q), we have that Rf1(Gk(Q)) divides (Gk(Q) ) if and only if all simple factors of Gk(Q) are among the factors of P1,...,Ps. By Claim 1.1.4, the simple factors of Gk(Q) are precisely the factors of Q with 0 2 multiplicity k + 1. Therefore, Rf1(Gk(Q)) divides (Gk(Q) ) if and only if all factors of Q with multiplicity k + 1 are among the factors of P1,...,Ps. We shape the above results further, allowing us to finally relate it to the (?) condition. For a polynomial Q ∈ R[X], and a finite family of polynomials (P1,...,Ps) in R[X], denote 0 2 B1,k := B1(Gk(Q)) = (Gk(Q) ) , and define

∗ ∗ Rf1(Gk(Q)) A1,k := A1(Gk(Q)) =  . gcd Rf1(Gk(Q)),B1(Gk(Q))

Proposition 5.1.5. For a polynomial Q ∈ R[X] and a finite family of polyno- ∗ ∗ mials (P1,...,Ps) in R[X], take B1,k and A1,k as defined above. Then A1,k has no real roots if and only if all real roots of Q with multiplicity k + 1 are among the roots of P1,...,Ps.

0 2 Proof. With Rf1(Gk(Q)) and B1,k = (Gk(Q) ) as above, applying Proposition 5.1.2 to Gk(Q) and setting n = 1 we know that

Rf1(Gk(Q)) Rf1(Gk(Q)) = 0 2 B1,k (Gk(Q) )

159 has no real roots if and only if all simple real roots of Gk(Q) are among the Rf1(Gk(Q)) roots of P1,...,Ps. Noting that is not necessarily polynomial, we use B1,k ∗ Rf1(Gk(Q)) Proposition 5.1.3, giving us that A1,k = has no real roots if gcd(Rf1(Gk(Q)),B1,k) and only if all real simple roots of Gk(Q) are among the roots of P1,...,Ps. ∗ Finally, applying Claim 1.1.4, we see that A1,k has no real roots if and only if all real roots of Q of multiplicity k + 1 are among the roots of P1,...,Ps.

∗ Since A1,k is polynomial we can use Sturm’s root counting method (or oth- ers) to determine whether the roots of a polynomial Q of an arbitrary chosen multiplicity are among the roots of a finite family of polynomials, P1,...,Ps. Therefore, we can determine whether a polynomial Q satisfies the condition that 0 the odd roots of Q are among the roots of P1,...,Ps. That is, we have obtained an algebraic method to check the (?) condition. Formulating this explicitly, we have the following result.

Proposition 5.1.6. For a finite family of polynomials (P1,...,Ps) ⊂ R[X], ∗ and a polynomial Q ∈ R[X] of degree d, define A1,k(Q) as above. Then the real 0 roots of Q of odd multiplicity are among the roots of P1,...,Ps if and only if ∗ 0 A1,k(Q ) has no real roots for all even k ≤ d − 1. The conditions for Lemma 3.4.2 can be rewritten in accordance to the above proposition. Given a finite family of polynomials (P1,...,Ps) ⊂ R[X], and for an arbitrary polynomial Q ∈ R[X], recall the full definition of Q Rfk(P1,...,Pi)(Q) :=  , Y k gcd Q, Pi  j≤i paying attention in particular to the argument P1,...,Pi. Lemma 5.1.7 (Modified Thom’s lemma). Consider a finite family of nonzero polynomials (P1,...,Ps) ⊂ R[X] such that for each i ∈ {1, . . . , s}, the term ∗ 0 0 A1,k(P1,...,Pi−1)(Pi ) has no real roots for all even k ≤ deg(Pi ). Then for any  ∈ {−1, 0, +1}s, the set

A = {x ∈ R | sign(Pi(x)) = i for all i = 1, . . . , s} is either empty, a point, or an open interval, and the set

A = {x ∈ R | sign(Pi(x)) ∈ i for all i = 1, . . . , s} is either empty, a point, or a closed interval (different to a point). Furthermore, if A = [a, b], then the interior of this interval is A = (a, b).

160 5.2 Completing a family to satisfy the (?) condition

Given an arbitrary finite family of nonzero polynomials (P1,...,Ps) ⊂ R[X], the task of completing this family to meet the conditions of Thom’s lemma 3.4.1 is simple - just add all nonzero derivatives of all orders of each member of the family. It is not as obvious how one might complete a family of polynomials to meet the (?) condition. We approach this problem, starting with the equivalent condition given in Proposition 5.1.6. According to Proposition 5.1.6, a family of polynomials (P1,...,Ps) does not ∗ 0 0 satisfy the (?) condition if A1,k(Pi ) has real roots for some even k ≤ deg(Pi ). ∗ 0 Notice that the roots (real or complex) of A1,k(P1,...,Pi−1)(Pi ) are precisely 0 the roots of Pi which are not among the roots of P1,...,Pi−1. Therefore, ∗ 0 0 if A1,k(P1,...,Pi−1)(Pi ) has real roots for some k ≤ deg(Pi ) even, adding ∗ 0 A1,k(Pi ) to the family (P1,...,Ps) (where possible) would be desirable. Indeed, ∗ 0 if we are able to add A1,k(Pi ) to the family (P1,...,Pi−1), then we can im- ∗ 0  mediately include Pi into the family P1,...,Pi−1,A1,k(P1,...,Pi−1)(Pi ) as ∗ 0 well. Of course, we may not be able to add A1,k(Pi ) to the family in compliance ∗ 0 0 with the (?) condition, since the derivative, A1,k(Pi ) , may have real roots of odd multiplicity which are not among the roots of P1,...,Pi−1. Hence, iterat- ∗ 0 0 ing this process, we check whether A1,k(Pi ) has roots of odd multiplicity not among the roots of (P1,...,Ps). We obtain the following algorithm. Real root inclusion algorithm Consider a finite family of polynomials (P1,...,Ps) ⊂ R[X], and take

∗ Rf1(P1,...,Pi−1)(Gk(Q)) A1,k(P1,...,Pi−1)(Q) = gcd(Rf1(P1,...,Pi−1)(Gk(Q)),B1,k(Q)) Y as previously defined. Note that, by convention, we define gcd(Q, Pi) = 1

Pi∈A for an empty family of polynomials A, so that Rf1(A)(Q) = Q. 0. Start with i = 1, k = 0, and let F be an empty family of polynomials. Set S = Pi. ∗ 0 1. For polynomial S, we check whether or not A1,k(F )(S ) has real roots. If ∗ 0 A1,k(F )(S ) has no real roots, then we increase k by 2 and repeat step 1. If ∗ 0 0 A1,k(F )(S ) has no real roots for all even k ≤ deg(S ), then we add S to the family F , increase i by 1, and repeat step 1 for S = Pi. (Here, the S that we ∗ 0 added to the family is Pi itself). If A1,k(F )(S ) has real roots, then we replace ∗ ∗ 0 S with S = A1,k(F )(S ) and proceed to step 2. ∗ ∗ ∗ 0 2. For polynomial S , we check whether or not A1,k(F )((S ) ) has real roots. ∗ 0 If A1,k(F )(S ) has no real roots, then we increase k by 2 and repeat step 2. If ∗ 0 0 A1,k(F )(S ) has no real roots for all even k ≤ deg(S ), then we add S to the family F , and repeat step 1 for S = Pi. Eventually the entire family (P1,...,Ps) is added to the family F , which is guaranteed to satisfy the conditions of Lemma 5.1.7. This algorithm terminates ∗ 0 after finitely many steps - in the worst case, A1,0(Pi ) has real roots, and each

161 ∗ ∗ 0 A1,0((S ) ) thereafter also has real roots, resulting in every derivative of Pi being added to the family F . This is the case of Thom’s lemma. To illustrate the potential effectiveness as well as limitations of the algorithm we include the following examples. Example 37. Consider the polynomials P = X5001 − 2X and Q = X100 − 9. Setting P = P1 and Q = P2, we apply the algorithm. We first check whether we can add P to the empty family F . We compute P 0 = 5001X5001 − 2. Then 0 0 0 Rf1(F )(G0(P )) = Rf1(F )(P ) = P . Using Sturm’s root counting method (or otherwise), we find that

P 0 A∗ (F )(P 0) = = P 0 1,0 gcd(P 0, ((P 0)0)2) has real roots. So we set S = P 0, and check if we can add S to the empty family. We compute S0 = X4999 after dividing by a constant. Then we have ∗ 0 ∗ 0 ∗ 0 A1,0(F )(S ) = 1, . . . , A1,4996(F )(S ) = 1, however A1,4998(F )(S ) = X, which has a real root. We set S = X and check if we can add S to the empty family. 0 ∗ We compute S = 1, and we have A1,0(F )(1) = 1 which has no roots. Therefore we add X to the family F . We now check if we can add P to the family F = (X). We compute ∗ 0 0 0 A1,k(X)(P ) = P which has real roots, so we check if we can add P to F . 0 ∗ ∗ Setting S = P , we compute A1,0(X)(S) = ...,A1,4998(X)(S) = 1 which has no roots. Therefore we add P 0 = 5001X5000 − 2 to the family F . We now check if we can add P to the family F = (X, 5001X5000 − 2). We ∗ 0 0 0 compute A1,k(X,P )(P ) = 1 for all even k ≤ deg(P ) = 5000 (in fact this is the case for all k), which has no roots. Therefore we add P = X5001 − 2X to the family F . We now check if we can add Q = X100 − 9 to the family of polynomials 5000 5001 ∗ 0 0 F = (X, 5001X − 2,X − 2X). We compute A1,k(X,P ,P )(Q ) = 1 for all even k ≤ deg(Q0) = 99, which has no roots. Therefore we add Q to the family F . Finally, we have a family F = (X, 5001X5000 −2,X5001 −2X,X100 −9) which satisfies the conditions of Lemma 5.1.7. This family has only 4 polynomials in it. In comparison, simply taking all derivatives of P and Q would result in a family of 5002 distinct polynomials, even after omitting all derivatives of Q as they differ from some of the derivatives of P by a scalar multiplier. In Example 37, the family of polynomials produced by the algorithm is much smaller than the family obtained by taking the closure of the family under derivation. Hence, applying this to the construction detailed in Proposition 3.4.8 by refining the PROJ operation to use the Algorithm 5.2, one may obtain a more efficient means of cylindrical algebraic decomposition. We now consider an example in which the algorithm produces a family equiv- alent to simply taking all derivatives of the polynomials in the initial family. Example 38. Consider the polynomials P = X2 − 4 and Q = X3 − 12X, as in Example 27. Setting P = P1 and Q = P2 we apply the algorithm. We first

162 check whether we can add P to the family F . We compute P 0 = 2X. Then 0 0 0 Rf1(F )(G0(P )) = Rf1(F )(P ) = P . Using Sturm’s root counting method we find that P 0 2X A∗ (F )(P 0) = = = 2X = P 0 1,0 gcd(P 0, ((P 0)0)2) gcd(2X, 2) has a real root. So we set S = P 0, and check if we can add S to the empty ∗ 0 family. We compute A1,0(F )(S ) = 1, which clearly has no roots. Therefore we add S = P 0 = 2X to the family F . We now check if we can add P to the family F = (2X). We compute ∗ 0 2X A1,0(F )(P ) = gcd(2X,2X) = 1, which clearly has no roots. Therefore we add P = X2 − 4 to the family F . Finally, we check if we can add Q to the family F = (2X,X2 − 4). We 0 2 2 ∗ 0 compute Rf1(F )(Q ) = Rf1(2X,X − 4)(3x − 12) = 1, so that A1,0(F )(Q ) = 1, which clearly has no roots. Therefore we can add Q to the family F . We now have a family F = (2X,X2 − 4,X3 − 12X) which satisfies the conditions of Lemma 5.1.7, which is essentially the collection of all derivatives of P and Q, as would be required by Thom’s lemma. Here, the computation cost of checking whether we can add each polynomial to the family F in the first place means that this method is less efficient in this instance.

Remark: Consider a polynomial P ∈ (P1,...,Ps) which we wish to add to the family F . Any irreducible factors of P of degree 2 (i.e. factors with non-real roots), where detectable, can be divided out of P before applying Algorithm 5.2 in order to reduce complexity. Since such factors don’t contribute any real roots to P , we know that their removal will not change the nature of the sets A on which each of the members of F have constant sign i, as in Lemma 5.1.7.

163 References

[1] Michel Coste. An introduction to semialgebraic geometry. Institut de Recherche Math´ematiquede Rennes, 2002. [2] Marie-Francoise Roy Saugata Basu, Richard Pollack. Algorithms in real algebraic geometry. Springer, 2016. [3] George E. Collins. Quantifier elimination for real closed fields by cylindrical algebraic decomposition. In Lecture notes in computer science, 1975. [4] Scott McCallum. An improved projection operation for cylindrical alge- braic decomposition. In Quantifier elimination and cylindrical algebraic decomposition, 1998.

[5] Scott McCallum; Hoon Hong. On using lazard’s projection in cad construc- tion. Journal of Symbolic Computation, 2016. [6] Christopher W. Brown. Improved projection operation for cylindrical al- gebraic decomposition. Journal of Symbolic Computation, 2001. [7] Davic Cox, John Little; Donal O’Shea. Ideals, varieties, and algorithms. Springer, third edition, 2006. [8] I. R. Shafarevich. Basic algebraic geometry. Springer-Verlag, 1977. [9] Jacek Bochnak, Michel Coste; Marie-Francoise Roy. Real Algebraic Geom- etry. Springer, 1998.

[10] Takuo Fukuda. Types topologiques def polynomes. Publications Math´ematiques de l’Insitut des Hautes Etudes Scientifiques, 46:87–106, 1976. [11] Isao Nakai. On topological types of polynomial mappings. Topology, 1984.

[12] Masato Fujita; Masahiro Shiota. Topological types of pfaffian manifolds. Nagoya Mathematics Journal, 2004. [13] Masahiro Shiota. Equivalence of differentiable mappings and analytic map- pings. Publications Math´ematiquesde l’Institut des Haute Etude Scien- tifiques, 1981.

[14] Dennis Soul´eArnon. Algorithms for the geometry of semialgebraic sets. PhD thesis, University of Wisconsin, Madison (Comuter Sciences), 1981. [15] Riccardo Benedetti; Masahiro Shiota. Finiteness of semialgebraic types of polynomial functions. Mathematische Zeitschrift, 1991.

[16] L. Birbrair; J. J. Nuno-Ballesteros. Topological K and A equivalences of polynomial functions. Journal of Singularities, 6, 2012. Preprint.

164 [17] George E. Collins; Hoon Hong. Partial cylindrical algebraic decomposition for quantifier elimination. Journal of Symbolic Computation, 1989. [18] Michel Coste; Marie Francoise Roy. Thom’s lemma, the coding of real algebraic numbers and the computation of the topology of semialgebraic sets. Journal of Symbolic Computation, 1986.

[19] F. Csaki. A concise proof of sylvester’s theorem. Periodica Polytechnica Electrical Engineering, 1970. [20] Michael Eisermann. The fundamental theorem of algebra made effective: An elementary real-algebraic proof via sturm chains. The American Math- ematical Monthly, 119:715, 2012. [21] F. R. Gantmacher. The theory of matrices, Volume 1. Chelsea Publishing Company, 1959. [22] F. R. Gantmacher. The theory of matrices, Volume 2. Chelsea Publishing Company, 1959.

[23] Heisuke Hironaka. Triangulations of algebraic sets. In Proceedings of Sym- posia in Pure Mathematics, volume volume 29, 1975. [24] Serge Lang. Algebra. Springer New York, 2002. [25] S. Lojasiewicz. Triangulation of semi-analytic sets. Annali Della Scuola Normale Superiori di Pisa, 1964. [26] Scott McCallum. Constructive triangulation of real curves and surfaces. Master’s thesis, The University of Sydney, 1979. [27] Scott McCallum. An improved projection operation for cylindrical algebraic decomposition of three-dimensional space. Journal of Symbolic Computa- tion, 1988. [28] Alkiviadis G. Arkitas; Panagiotis S. Vigklas. Counting the number of real roots in an interval with vincent’s theorem. Bulletin Mathematique de la Soci´et´edes Sciences Mathematiques de Romanie, Nouvelle S´erie, 2010.

[29] Richard Bellman. Introduction to matrix analysis. McGraw Hill Book Company, New York, 1960.

165 A Hermite’s Method

We introduced some notation and definitions in Subsection 1.3. Namely, for d a polynomial P = a0X + ... + ad ∈ R[X] of positive degree d with roots α1, . . . , αd ∈ C, we defined d X k Nk := (αi) , i=0 the k-th power Newton sums of the roots of P . We show how to explicitly compute the Newton sums Nk using the coefficients a0, . . . , ad. Consider the quotient P 0/P , which can be expressed as

0 P 0 (X − α ) ... (X − α ) = 1 d P (X − α1) ... (X − αd) d  X (X − α1) ... (X − αi−1)(X − αi+1) ... (X − αd) = (X − α ) ... (X − α ) i=1 1 d d X 1 = . X − α i=1 i

Each 1 can be rewritten as X−αi 1 1 1 = X − αi X 1 − αi/X ∞ j 1 X αi  = X X j=0 ∞ X  1 j+1 = αj , i X j=0 yielding the identity

d ∞ P 0 X  X  1 j+1 = αj P i X i=1 j=0 ∞ X  1 j+1 = (αj + ... + αj ) 1 d X j=0 ∞ X  1 j+1 = N . j X j=0 Then we have

0 d d−1 ∞ j XP a0dX + a1(d − 1)X + ... + ad−1X X  1  = = N , P a Xd + a Xd−1 + ... + a j X 0 1 d j=0

166 hence

∞   X  1 j a dXd + ... + a X = a Xd + ... + a N . (2) 0 d−1 0 d j X j=0

Equating coefficients from both sides of equation 2 we have

a0d = a0N0 =⇒ N0 = d,

(d − 1)a1 − N0a1 a1 a0N1 + a1N0 = a1(d − 1) =⇒ N1 = = − . a0 a0 Continuing equating coefficients inductively, we find

a0Nj + a1Nj−1 + ... + ajN0 = (d − j)aj

a1Nj−1 + ... + aj−1N1 + jaj =⇒ Nj = − a0 for j ≤ d, (noting that the dth term is not present in XP 0 as in the left hand side of equation 2), and

a0Nj + a1Nj−1 + ... + adNj−d = 0

a1Nj−1 + ... + adNj−d =⇒ Nj = − a0 for j ≥ d. Hence, iterating over j, we are able to calculate the Nj using only the degree and coefficients of the polynomial. We give a brief example for illustration of this process. Example 39. Let P = (X2 + 1)(X − 1) = X3 − X2 + X − 1 with coefficients a0 = 1, a1 = −1, a2 = 1, a3 = −1, and roots α1 = 1, α2 = i, α3 = −i. By the above equations, we have

N0 = d = 3

a1 −1 N1 = − = − = 1 a0 1 a1N1 + 2a2 (−1)(1) + (2)(1) N2 = − = − = −1 a0 1 a1N2 + a2N1 + 3a3 (−1)(−1) + (1)(1) + 3(−1) N3 = − = − = 1 a0 1 a1N3 + a2N2 + a3N1 (−1)(1) + (1)(−1) + (−1)(1) N4 = − = = 3 a0 1 which we can also check manually since we know the roots of P to begin with. We make use of our ability to compute the Newton sums of the roots of P , taking our first step in the direction of Hermite’s root counting method.

167 Consider the matrix comprised of the Newton sums of the roots of P , defined   N0 N1 ...Nd−1  N1 N2 ...Nd  H(P ) :=    . . .. .   . . . .  Nd−1 Nd ...N2d−2 noticing that H(P ) is equal to its transpose, (that is, the (m, n)-th entry is equal to the (n, m)-th entry). Note also that, since P has real coefficients, if α is a root of P then the complex conjugate α is also a root of P , meaning that H(P ) is necessarily a real matrix. Let U := (U1,...,Ud), and let Q(U) be a quadratic form, which is defined as a polynomial in U1,...,Ud where every term has degree 2. We say that Q has matrix M if

Q(U) = U >MU     m1,1 . . . m1,d U1  . .   .  = (U1,...,Ud)  . .   .  . md,1 . . . md,d Ud

If Q(U) has real coefficients and can be decomposed as

p p+q X 2 X 2 Q(U) = (Lj(U)) − (Lj(U)) j=1 j=p+1 where the set of Lj for j = 1, . . . , p + q are linear forms with real coefficients, and are all linearly independent, then the signature of Q is defined as p − q, and the rank of Q is p + q. The rank of a quadratic form is equal to the rank of its associated matrix, (Proposition 1.3.2). Now, let Q be a quadratic form with matrix H(P ). That is,

Q(U) = U >H(P )U.

168 Expanding and rearranging this expression, we have

Q(U) = U >H(P )U 2 = N0U1 + N1U1U2 + ... + Nd−1U1Ud 2 + N1U2U1 + N2U2 + ... + NdU2Ud + ... . . 2 + Nd−1UdU1 + NdUdU2 + ... + N2d−2Ud d  X 0 2 1 d−1 = αi U1 + αi U1U2 + ... + αi + ... i=1  d−1 d 2d−2 2 + αi UdU1 + αi UdU2 + ... + αi Ud

2 2  d−1   d−1  = U1 + α1U2 + ... + αi Ud + ... + U1 + αdU2 + ... + αd Ud = L2 + ... + L2 , α1 αd

d−1 where the Lαi denote the linear forms (U1 + αiU2 + ... + αi Ud). Let V be the Vandermonde matrix of P ,

 1 ... 1   α1 . . . αd    V :=  . .  ,  . .  d−1 d−1 α1 . . . αd and observe that   1 ... 1  d−1 1 α1 . . . α1  α1 . . . αd  >   . . .  V(V ) =  . .  . . .   . .  d−1 d−1 d−1 1 αd . . . αd α1 . . . αd   N0 N1 ...Nd−1  N1 N2 ...Nd  =    . . .. .   . . . .  Nd−1 Nd ...N2d−2 =H(P ).

169 Then, noting that

 d−1   1 α1 . . . α1 U1 > . . .   .  V · U = . . .  ·  .  d−1 1 αd . . . αd Ud  d−1  U1 + α1U2 + ... + α1 Ud  .  =  .  d−1 U1 + αdU2 + ... + αd Ud   Lα1  .  =  .  ,

Lαd

it is clear that the quadratic form Q(U) is indeed the sum of squares of the Lαi . Claim 1.3.1 makes a connection between the signature and rank of Q, and the number of real and complex roots of P , respectively.

Proof of Claim 1.3.1. First let P ∈ R[X] have degree d with roots α1, . . . , αd ∈

C, k of which are distinct, say α1, . . . , αk. Then the linear forms Lα1 ,...,Lαk are all linearly independent from one another. This is proven with a simple contra- diction by noting that, if the Lα1 ,...,Lαk were linearly dependent for distinct α1, . . . , αk ∈ R, then the system    k−1 1 α1 .  .  c0 . + ... + ck−1  .  = 0 k−1 1 αk would be satisfied by some c0, . . . , ck−1 ∈ C. However, the α1, . . . , αk would then be k distinct solutions to a degree k − 1 polynomial, which is impossi- ble. Therefore the Lα1 ,...,Lαk must be linearly independent for distinct roots α1, . . . , αk. For a real root α of P , the linear form Lα has real coefficients.

Hence, for a real roots αi of P , the linear forms Lαi each contribute +1 to the rank of Q. However, if α is a non-real root of P , then α is another root of P . Since α 6= α, the forms Lα and Lα are linearly independent, but we have

2 2 d−1 2 d−1 2 Lα + Lα = (U1 + αU2 + ... + α Ud) + (U1 + αU2 + ... + α Ud) 2 2 = Re(Lα) + 2iIm(Lα)Re(Lα) − Im(Lα) 2 2 + Re(Lα) + 2iIm(Lα)Re(Lα) − Im(Lα) 2 2 = 2Re(Lα) − 2Im(Lα) .

2 Note that the Im(Lα) term is necessarily nonzero since α is assumed to be 2 non-real, and the Re(Lα) term is nonzero since the coefficient of U1 is 1. Fur- 2 2 thermore, both 2Re(Lα) and −2Im(Lα) are squares of linear forms with real 2 2 coefficients. Hence Lα + Lα contributes +1 to both p and q (the number of

170 positive and negative square terms in Q respectively), and therefore a non-real root (along with its complex conjugate) contributes +2 to the rank of Q, and contributes +0 to the signature of Q. This concludes the proof. We are also able to produce a result similar to Theorem 1.1.3 in the context of Newton sums. That is, for polynomials P,Q ∈ R[X] we are able to calculate the number of distinct real roots c of P such that Q(c) > 0 minus the number of those such that Q(c) < 0, using Newton sums. First we define some new notation. If α1, . . . , αd are the complex roots of P , (counted with multiplicities), then we define

d 0 X k Nk := Q(αi)αi , i−1 and we denote by H0(P ) the matrix defined in the same way as H(P ), but with 0 0 Nk replacing the Nk. We compute the modified Newton sums Nk using the d unmodified Newton sums, Nk, as follows. Let P (X) = a0X + ... + ad−1X + ad e and Q(X) = b0X + ... + be−1X + be, and let α1, . . . , αd ∈ C be the roots of P . Then

d 0 X k Nk = Q(αi)(αi) i=1 d X e e−1  k = b0αi + b1αi + ... + be (αi) i=1 d X e+k e+k−1 k = b0αi + b1αi + ... + beαi i=1

=b0Ne+k + b1Ne+k−1 + ... + beNk, where the Nk are computed as usual. The following result is an analogue to Theorem 1.1.3.

Proposition A.0.1. Let P and Q be polynomials in R[X], and let α1, . . . , αd be the set of all complex roots of P . Then the signature of the matrix H0(P ) is equal to the number of distinct real roots c of P such that Q(c) > 0, minus the number of those such that Q(c) < 0.

171 Proof. Denote by W the quadratic form whose matrix is H0(P ). Then we have

W (U) = U >H0(P )U  0 0    N0 ...Nd−1 U1 U ...U   . .. .   .  = 1 d  . . .   .  0 Nd−1 ...N2d−2 Ud 0 2 0 0 = N0U1 + N1U1U2 + ... + Nd−1U1Ud 0 0 2 0 +N1U1U2 + N2U2 + ... + NdU2Ud + ... . . 0 0 0 2 +Nd−1U1Ud + NdU2Ud + ... + N2d−1Ud   2  d−1 d−1 = Q(α1) + ... + Q(αd) U1 + ... + Q(α1)α1 + ... + Q(αd)αd U1Ud + . . +

 d−1 d−1 Q(α1)α1 + ... + Q(αd)αd U1Ud + ...

 2d−2 2d−2 2 + Q(α1)α1 + ... + Q(αd)αd Ud 2  d−1  = Q(α1) U1 + α1U2 + ... + α1 Ud + ... 2  d−1  + Q(αd) U1 + αdU2 + ... + αd Ud .

Let α1, . . . , αr ∈ C be the distinct roots of P . From Claim 1.3.1 we know that the d−1 set of linear forms (U1 + ... + αi Ud), with i = 1, . . . , r, is linearly independent, and that the signature of H(P ) is equal to the number of distinct real roots of P . We know that Q(z) ∈ C for any z ∈ C. Hence for distinct roots α1, . . . , αr ∈ C, the set of 0 p d−1  linear forms Lαi (U) := Q(αi) U1 + αiU2 + ... + αi Ud for αi ∈ {α1, . . . , αr}, is 0 linearly independent, (and clearly, for roots αi ∈ {αr+1, . . . , αd} the linear forms Lαi are not linearly independent from this set). Furthermore, we know that Q(x) ∈ R for 0 2 d−1 2 any x ∈ R, and therefore the square terms Lαi = Q(αi) U1 + ... + αi Ud have real coefficients (the value of Q(αi)) for real roots αi of P . Particularly, the distinct real roots αi contribute +1, −1, or 0 to the signature of W when the value of Q(αi) is positive, negative, or 0, respectively. We now consider the effect of the complex roots of P on the signature of W . For non-real roots α and α of P , let pQ(α) = a + ib, p d−1  (and therefore Q(α) = a − ib), and denote by Lα = U1 + . . . α Ud the standard

172 linear forms used previously. Then we have

0 2 0 2 d−1 2 d−1 2 Lα + Lα = (a + ib)(U1 + ... + α Ud) + (a − ib)(U1 + ... + α ) 2 2 = (a + ib)(Re(Lα) + iIm(Lα)) + (a − ib)(Re(Lα) − iIm(Lα)) h   i2 = aRe(Lα) − bIm(Lα) + i aIm(Lα) + bRe(Lα) h   i2 + aRe(Lα) − bIm(Lα) − i aIm(Lα) + bRe(Lα)  2  2 = A + iB + A − iB

=2A2 − 2B2,   where A = aRe(Lα) − bIm(Lα) and B = aIm(Lα) + bRe(Lα) are real linear forms. Furthermore, A and B are linearly independent, as Re(Lα) and Im(Lα) are linearly independent from one another, and the determinant of the system A + λB = 0 is λ(a2 + b2) 6= 0 since λ 6= 0 and b 6= 0. (Indeed we assume b 6= 0 otherwise pQ(α) 0 0 is real, in which case the treatment of the forms Lα and Lα reduces to the case seen in the proof of Claim 1.3.1.) Hence complex conjugate root pairs contribute 0 to the signature of W . We have shown that the distinct real roots c of P for which Q(c) > 0 each con- tribute +1 to the signature of W , and the distinct real roots c of P for which Q(c) < 0 each contribute −1 to the signature of W , while the roots c of P for which Q(c) = 0 do not affect the signature of W , and the complex roots of P , collectively, do not change the signature of W . We outline a method by which we can compute the signature of the matrix H(P ). We defined δi(P ), the i-th principal minor of the matrix H(P ), (or simply δi when context makes clear which polynomial P we are using), to be   N0 ...Ni−1  . .. .  δi(P ) = det  . . .  , Ni−1 ...N2i−2 the determinant of the matrix formed from the first i rows and columns of H(P ). Theorem A.0.2 (Jacobi’s Theorem). Assume that none of the principal minors of the matrix H(P ) are zero. Then the signature of H(P ) is equal to d minus twice the number of sign changes in the sequence 1, δ1, . . . , δd of principal minors. Proof. A similar proof can also be found in [21] Chapter 9. Consider a general d d symmetric bilinear form B in the basis (e1, . . . , ed), either C × C → C or d d R × R → R, with matrix M whose principal minors δ1, . . . , δd are all nonzero. That is,

B(U, V ) = U >MV     B(e1, e1) ...B(e1, ed) V1 U ...U   . .. .   .  = 1 d  . . .   .  B(ed, e1) ...B(ed, ed) Vd

173 Construct the orthogonal basis (b1, . . . , bd) for B as follows. Set b1 = e1, then for 2 ≤ i ≤ d define

bi = λi,1e1 + ... + λi,i−1ei−1 + ei (3) Then we consider the linear system of equations

B(e1, bi) = 0 . .

B(ei−1, bi) = 0, which, using the bilinearity of B, is equivalent to

B(e1, bi) = λi,1B(e1, e1) + ... + λi,i−1B(e1, ei−1) + B(e1, ei) = 0 . . . .

B(ei−1, bi) = λi,1B(ei−1, e1) + ... + λi,i−1B(ei−1, ei−1) + B(ei−1, ei) = 0.

0 We denote by Mi the matrix representing this system,   B(e1, e1) ...B(e1, ei−1) 0  . .. .  Mi :=  . . .  , B(ei−1, e1) ...B(ei−1, ei−1) which is precisely the square matrix consisting of the first i−1 rows and columns 0 of M. That is, the determinant of this system det(Mi ) is equal to δi−1, the principal minor of M of order i − 1. Since we are assuming all principal minors 0 of M to be nonzero, Mi is invertible. That is, the coefficients λi,1, . . . , λi,i−1 are uniquely determined by the values of the form B(ej, ek) for j, k = 1, . . . , i − 1, and hence bi is uniquely determined for all 2 ≤ i ≤ d. That is, the orthogonal basis b1, . . . , bd is uniquely determined. We relate the B(bi, bi) to the principal minors δi of M for i = 1, . . . , d. For the case of i = 1 we have B(b1, b1) = B(e1, e1) = δ1. For 2 ≤ i ≤ d we have B(bi, bi) = δi/δi−1. Consider the quadratic form Q in the orthogonal basis b1, . . . , bd defined by     B(b1, b1) ...B(b1, bd) U1 U ...U   . .. .   .  Q(U) = 1 d  . . .   .  B(bd, b1) ...B(bd, bd) Ud B(b , b ) 0 ... 0  1 1   . . . U1  0 .. .. .  = U ...U     .  1 d  . . .   .   . .. .. 0    Ud 0 ... 0 B(bd, bd) 2 2 =B(b1, b1)U1 + ... + B(bd, bd)Ud .

174 Since δi = δi−1B(bi, bi) for each 2 ≤ i ≤ d, a relative sign difference between δi−1 and δi indicates that B(bi, bi) is negative, and no sign change means that B(bi, bi) is positive. If p and q are the number of positive and negative squares respectively in this decomposition of Q(U), then each positive B(bi, bi) con- tributes +1 to p and 0 to q, while each negative B(bi, bi) contributes +1 to q and 0 to p. Hence, assuming the principal minors to be nonzero, the signature is equal to p−q = p+q −2q = d−2q as required. The statement of the theorem follows by applying the above constructions to the quadratic form whose matrix is H(P ). Remark: It is important to note that the above result (i.e. the signature) is independent of our choice of basis. This fact is known as Sylvester’s law of inertia for Hermitian forms, (see [21] or [29]). Indeed, the eigenvalues of a diagonal matrix are simply the entries along the diagonal, and these eigenvalues are unchanged by the (non-singular) transformation from one basis to another. This clearly has crucial implications for results concerning eigenvalues and signatures, particularly with respect to Claim 1.3.1. We stated Proposition 1.3.2. Proof of Proposition 1.3.2. We know that H(P ) is a real symmetric matrix. As discussed earlier, we can write

Q(U) = U >H(P )U d X 2 = (Lαd (U)) i=1 p p+q X 2 X 2 = (Li(U)) − (Li(U)) i=1 i=p+1 where the Li are linearly independent. Here, p and q are known to be the number of positive and negative eigenvalues of H(P ) respectively, all of which are real since H(P ) is a real symmetric matrix (and is therefore a Hermitian matrix), [21] Chapter 9. The rank of H(P ) is equal to d minus the dimension of its kernel. Therefore the rank of H(P ) is equal to p + q, (the number of nonzero eigenvalues), which is precisely the rank of Q as a quadratic form. Remark: From Claim 1.3.1 we know that p + q is equal to the number of d−1 distinct roots of P , and we know that the Lα = U1 + αU2 + ... + α Ud are linearly independent for distinct roots α of P . Hence the rank of H(P ) is equal to the number of linearly independents forms Lα, which is precisely the rank of Q as a quadratic form. We stated Proposition 1.3.3. Proof of Proposition 1.3.3. If the rank of H(P ) is r, then clearly the principal minors δr+1 = ... = δd = 0. We show that δr 6= 0. Consider the r × r matrix Hr(P ) formed from the first r rows and columns of H(P ), so that det(Hr(P )) = 0 δr. We consider the quadratic form Qr in the variables U := (U1,...,Ur) whose

175 matrix is Hr(P ). Note that Hr(P ), and hence Qr, can have rank at most r. Using similar arguments to those at the beginning of this subsection, we have

0 0 > 0 Qr(U ) = (U ) Hr(P )U     N0 ...Nr−1 U1  . .. .   .  = (U1,...,Ur)  . . .   .  Nr−1 ...N2r−2 Ur r−1 2 r−1 2 = (U1 + α1U2 + ... + α1 Ur) + ... + (U1 + αdU2 + ... + αd Ur) . We know from Claim 1.3.1 that if the rank of H(P ) is equal to r, the poly- nomial P must have exactly r distinct roots. Assuming that α1, . . . , αr ∈ C r−1 are distinct roots of P , the set of linear forms U1 + αiU2 + ... + αi Ur, where αi ∈ {α1, . . . , αr}, is linearly independent, (and any other linear form r−1 U1 + αiU2 + ... + αi Ur, where r < i ≤ d, is already in this set). That is, the rank of Qr is r, and so by Proposition 1.3.2 above, the rank of Hr(P ) is also r. Therefore δr = det(Hr(P )) 6= 0, proving the forward implication. For the reverse implication, assume that δr 6= 0 and δr+1 = ... = δd = 0 for some 1 ≤ r ≤ d. Clearly the rank of H(P ) ≥ r. From the previous arguments, we know that if rank(H(P )) = k for some k > r, then δk 6= 0, violating our assumption. Hence rank(H(P )) = r. Following Jacobi’s theorem, we obviously cannot assume that all principal minors δ1, . . . , δd of the matrix H(P ) will be nonzero for arbitrary P ∈ R[X], and the principal minors alone are not enough to deduce the signature of an arbitrary quadratic form. However, the matrix H(P ) has a special property that can be exploited here to prove Theorem 1.3.4. We defined an expression regarding the signs of the sequence of principal minors when we allow them to be zero. If P ∈ R[X] is a polynomial of degree d, and δ1, . . . , δd are the principal minors of the matrix H(P ), then for 1 ≤ i ≤ d,

sign(g δi) := sign(δi) if δi 6= 0 k(k−1)/2 sign(g δi) := − 1 sign(δi−k) if δi = ... = δi−k+1 = 0 and δi−k 6= 0.

Proof of Theorem 1.3.4. Not only is H(P ) equal to its own transpose, but the entries along each of the anti-diagonals - the (i, j)-th entries where i + j is constant - are equal. That is, H(P ) is a Hankel matrix. Frobenius gave a result concerning such matrices (Theorem 24, page 343 of [21]), showing that the signature can be computed using only the principal minors. We use this result along with Propositions 1.3.2 and 1.3.3 to formulate the rule by which we deduce the number of distinct real roots of P in the case that the principal minors may vanish. Applying Proposition A.0.1 to Theorem 1.3.4 we arrive at the following result.

176 0 0 Corollary A.0.3. Let P,Q ∈ R[X], let H (P ) be the matrix with entries Ni replacing the Ni in H(P ) as defined above, and let δ1, . . . , δd be the principal 0 minors of the matrix H (P ) such that δr 6= 0 and δr+1, . . . , δd = 0. Then the number of distinct real roots c of P such that Q(c) > 0 minus the number of those such that Q(c) < 0 is equal to r minus the number of changes in the sequence 1, sign(g δ1),..., sign(g δr).

Example 40. We will now apply the results of this subsection to the example 2 of P = X +aX +b and Q = X +c. We first compute the Ni for the polynomial P , as shown previously.

N0 = 2

N1 = −a

N2 = −(a(−a) + 2b) 3 N3 = −a + 3ab  2 −a  H(P ) = , −a a2 − 2b hence

δ1 = 2 2 δ2 = a − 4b,

2 noticing that δ2 = a − 4b is the discriminant of P . We consider the number of real and complex roots of P in terms of the coefficients a and b. We know that if δ2 is positive then P will have 2 distinct real roots, if δ2 is zero then P will have a single real root, and if δ2 is negative then P will have 2 distinct complex roots. We can verify the results of applying Theorem 1.3.4 using these facts. If δ2 > 0 then the sequence 1, sign(g δ1),..., sign(g δr) is simply 1, 1, 1, where r = d = 2, and so the number of distinct real roots of P is equal to 2, which we know to be true. If δ2 = 0 then the sequence 1, sign(g δ1),..., sign(g δr) is 1, 1, sign(0),g which becomes 1, 1, 1, where r = 1 < d, hence the number of distinct real roots of P is 1. Indeed P has a double root when the discriminant is zero. Finally, when δ2 < 0, the sequence 1, sign(g δ1),..., sign(g δr) is 1, 1, −1, where r = d = 2, and so the number of distinct real roots of P is equal to 2 minus twice the number of sign changes in the signg sequence. Hence the number of distinct real roots of P is zero, which we know to be the case when the discriminant is negative. We now continue this example to apply Corollary A.0.3 to both P and Q, 0 computing the Newton sums Ni of the roots α of P , modified by Q(α). Using d 2 the notation of Corollary A.0.3, we have P = a0X + ... + ad = X + aX + b,

177 e and Q = b0X + ... + be = X + c, for arbitrary a, b, c ∈ R. Then we have

0 N0 =b0N1 + b1N0 =1(−a) + c(2) =2c − a 0 N1 =b0N2 + b1N1 =1(a2 − 2b) + c(−a) =a2 − ac − 2b 0 N2 =b0N3 + b1N2 =1(−a3 + 3ab) + c(a2 − 2b) = − a3 + a2c + 3ab − 2bc  2c − a a2 − ac − 2b  H0(P ) = a2 − ac − 2b −a3 + a2c + 3ab − 2bc hence

δ1 = 2c − a 3 2 2 2 2 2 δ2 = −a c + a c + a b + 4abc − 4b − 4bc .

2 If we set a = 0, b = 1, and let c vary, then δ1 = 2c and δ2 = −4 − 4c . Then the sequence (1, sign(^δ1), sign(^δ2)) becomes (1, 1, −1), (1, −1, −1), or (1, 1, −1) when c > 0, c < 0, or c = 0, respectively. In each of these cases r = 2 and the total number of sign changes in the sequence is 1. The rule of Corollary A.0.3 asserts that the number of distinct real roots α of P such that Q(α) > 0 minus the number of those such that Q(α) < 0 is equal to r minus twice the number of sign changes in the sign sequence, which, in each case, is equal to 0. Of course, P = X2 + 1 is known to have no real roots. To extend this to full generality, we let a, b, and c vary over R. Then 2 2 δ1 = 2c − a, and we can write δ2 = (a − 4b)(c − ac + b), which is zero when 2 2 2 either a − 4b = 0 or c − ac + b = 0, noticing that the factor a − 4b in δ2 is 2 precisely the discriminant of P . Treating√c − ac + b as a polynomial in c, it is clear that c2 − ac + b = 0 when c = (a ± a2 − 4b)/2. Since the discriminant of P is a2 − 4b, P has no real roots when a2 − 4b < 0. Notice that the discriminant of c2 − ac + b as a polynomial in c is also a2 − 4b. Hence if a2 − 4b < 0 then both P and c2 − ac + b have no real roots, and in particular, c2 − ac + b takes 2 only positive values. Therefore, if a − 4b < 0, we are ensured that δ2 < 0. In this case, the sequence (1, sign(^δ1), sign(^δ2)) becomes (1, sign(^δ1), −1), and the number of sign changes in this sequence is always equal to 1. Indeed, we find that the quantity computed by Corollary A.0.3 is 0 when P has no√ real roots. 2 2 On the other hand, if a − 4b > 0, P has two distinct roots (−a ± √a − 4b)/2, 2 2 the factor c − ac + b in δ2 (as a polynomial in c) has roots (a ± a − 4b)/2, 2 and the root of Q is −c. Therefore,√ assuming that (a √− 4b) > 0, we find that 2 2 δ2 > 0 when either c < (a − a − 4b)/2 or c > (a + a − 4b)/2, and δ2 < 0

178 √ √ √ 2 2 2 when (a − a − 4b)/2 < c < (a + a√− 4b)/2. When c < (a − a − 4b)/2 2 the sign of δ1 is −1, and when c > (a + a − 4b)/2 the sign of δ1 is +1. Hence the sequence (1, sign(^δ1), sign(^δ2)) becomes (1, 1, 1) and (1, −1, 1), respectively. Indeed, these cases correspond to Q√being positive (and negative,√ respectively) at both roots of P . When (a − a2 − 4b)/2 < c < (a + a2 − 4b)/2, the sequence (1, sign(^δ1), sign(^δ2)) becomes (1, sign(^δ1), −1), having exactly 1 sign change regardless of the sign of δ1. This corresponds to the root of Q being between the two roots of P , meaning that Q > 0 at one of the roots of P , and Q < 0 at the other root of P . The other possibility is that (a2 − 4b) = 0, in which case δ2 = 0, and P has exactly one real root at −a/2. Observe that δ1 = 2c − a vanishes when c = a/2, and that the root of Q is at −c. Hence if c > a/2 then δ1 > 0, and the sign sequence becomes (1, 1, 0), and the quantity computed by Corollary A.0.3 is 1. Indeed, this case corresponds to Q having a root < −a/2, meaning that Q is positive at the root of P . If c = a/2 then the sign sequence becomes (1, sign(0)^ , sign(0))^ = (1, 1, −1), and the quantity computed by Corollary A.0.3 is 0, (and indeed, the roots of both P and Q coincide). Finally, if c < a/2 then the sign sequence becomes (1, sign(0)^ , sign(0))^ = (1, −1, 0), and the quantity computed by Corollary A.0.3 is −1. Indeed, Q is negative at the root of P . In Sturm’s method we counted the roots of polynomials using computation trees, which branched each time a new leading coefficient was calculated to account for that coefficient being zero or nonzero, depending on the parame- ters. Hermite’s method, however, does not require branching, even when dealing with parameters as in a general polynomial. Despite this, Sturm’s method and Hermite’s method are related via subresultant polynomials, as mentioned at the beginning of this subsection. We introduce the related notion of princi- d pal subresultant coefficients. Take the polynomials P = a0X + . . . ad and e Q = b0X + ... + be of degree d and e respectively, and consider the list of polynomials Xe−1P,Xe−2P,...,XP, P, Q, XQ,...,Xd−2Q, Xd−1Q. The Sylvester matrix of P and Q is the square matrix of size d + e whose entries are the coefficients of the polynomials in this list, the (i, j)-th coordinate being the j-th coefficient of the i-th polynomial in the list. That is, the Sylvester matrix

179 of P and Q is the square matrix of the form

a0 . . . ak ...... ad 0 ...... 0   0 a0 . . . ak ...... ad 0 ... 0     ......   0 0 . . . . .     ......   . . . . . 0     0 ...... 0 a0 . . . ak ...... ad    0 ...... 0 b0 ...... be    .  0 ...... 0 b0 ...... be 0     ......   . . . . 0 0     ......   ......     . . . .   0 0 ......     0 b0 ...... be 0 ...... 0  b0 ...... be 0 ...... 0

The resultant of P and Q is the determinant of the Sylvester matrix of P and Q. The principal subresultant coefficient of order i of P and Q, written PSRCi(P,Q), is the determinant of the size d + e − 2i square matrix formed by deleting the first and last i rows and columns from the Sylvester matrix of P and Q. Note that PSRCi is defined for all 0 ≤ i < min(d, e). Of course PSRC0(P,Q) is simply the resultant of P and Q. Example 41. Let P = X2 + aX + b and Q = X + c. Then the Sylvester matrix of P and Q is

1 a b 0 1 c , 1 c 0 and so we compute the resultant, and principal subresultant coefficient of order 1, of P and Q:

1 a b 2 PSRC0(P,Q) = det 0 1 c = ac − b − c 1 c 0   PSRC1(P,Q) = det 1 = 1

We will often consider the principal subresultant coefficients of a polynomial d and its own derivative. For a general polynomial P = a0X + ... + ad, the

180 Sylvester matrix of P and P 0 is the size 2d − 1 square matrix of the form

 a0 a1 ...... ad 0 ... 0   ......   0 . . . . .     ......   . . . . . 0     0 ... 0 a0 a1 ...... ad     0 ...... 0 da0 ...... ad−1   .  ......   . . . . 0     ......   ......     . . .   0 da0 ...... ad−1 . .  da0 ...... ad−1 0 ...... 0

It is known that the resultant of P and Q is zero if and only if P and Q 0 have a common factor, [24]. Hence PSRC0(P,P ) = 0 if and only if P has a multiple root. We now generalize this using subresultant coefficients, which we can apply to the method of root counting via principal minors, (Theorem d 1.3.4 and Corollary A.0.3). Take arbitrary polynomials P = a0X + . . . ad and e Q = b0X + ... + be of degree d and e respectively. Claim A.0.4. For an non-negative integer k < min(d, e), there exist nonzero U, V ∈ R[X] with deg(U) < e−k and deg(V ) < d−k such that deg(UP +VQ) < k if and only if PSRCk(P,Q) = 0. e−k−1 Proof. Consider, in addition to P and Q above, the polynomials U = u0X + d−k−1 ...+ue−k−1 and V = v0X +...+vd−k−1, where u0 and v0 are not assumed to be nonzero. Hence deg U < e − k and deg V < d − k. Multiplying P and U we have

e−k−1 d UP =(u0X + ... + ue−k−1)(a0X + . . . ad) d+e−k−1 d =(u0a0)X + ... + (u0ae−k−1 + ... + ue−k−1a0)X d−1 e−k−1 + (u0ae−k + ... + ue−ka0)X + ... + (u0ad + . . . ue−k−1ad−(e−k−1))X e−k−2 + (u1ad + ... + ue−k−1ad−(e−k−2))X + ... k + (ue−2k−1ad + ... + ue−k−1ad−k)X + ... + ue−k−1ad.

We can find a similar expression for VQ. Here we are writing the expansion of the product UP as though k ≤ e−k−1 ≤ d, which is not always true in general. The requirement that deg(UP + VQ) < k is equivalent to the coefficients of all terms of UP + VQ of order k and greater are zero. By the above expansions we can see that this is a linear system of equations in the coefficients of P and Q, with the coefficients of U and V as unknowns, whose determinant is the

181 determinant of the matrix   a0 ...... ad 0 ... 0  . . . .   0 ......     . . . .   ...... 0     . . .   . .. .. a   d   . . . .   ......   . .     0 ...... 0 a0 ...... ad−k    0 ...... 0 b0 ...... be−k   . . . .   ......   . .   .   . . . . .   . . . be   .   ......   . . . . 0     ......   ......     ......   0 . . . .  b0 ...... be 0 ...... 0 up to a permutation of rows. Notice that this is in fact the Sylvester matrix of P and Q with the first and last k rows and columns removed, and hence the determinant of this matrix is equal to ±PSRCk(P,Q). There exist polynomials U and V with the desired properties if and only if the determinant of this matrix is equal to 0, hence deg(UP + VQ) < k if and only if PSRCk(P,Q) = 0.

Proof of Theorem 1.3.5. Let P = AB and Q = AC, where A, B, C ∈ R[X] such that A = gcd(P,Q). If deg(A) > k then, in the context of Claim A.0.4 above, we can set U = C and V = −B so that UP + VQ = ABC − ABC = 0. Since deg(A) > k, we know that deg(U) = deg(C) < e − k and deg(V ) = deg(B) < d − k. Conversely, if deg(A) ≤ k, for any polynomials U, V ∈ R[X] such that UP + VQ = 0, we must have deg(U) ≥ e − k and deg(V ) ≥ d − k. Hence the greatest common divisor of P and Q has degree > k if and only if there exist some polynomials U and V such that UP + VQ = 0. We proceed with the proof of the lemma by induction on k. First, assume k = 0. It is known that the gcd of P and Q has nonzero degree if and only if PSRC0(P,Q) = 0, and so the statement of the lemma holds for k = 0. Assume k > 0, and that the statement holds for k − 1. That is, we as- sume that gcd(P,Q) has degree ≥ k if and only if PSRC0(P,Q) = ... = PSRCk−1(P,Q) = 0. We want to show that the degree of gcd(P,Q) has de- gree > k if and only if PSRCk(P,Q) = 0. If PSRCk(P,Q) = 0, then by Claim A.0.4, there exist U, V ∈ R[X] with deg(U) < e−k and deg(V ) < d−k such that deg(UP + VQ) < k. If such U and V exist then, by the inductive assumption, gcd(P,Q) has degree ≥ k. However, this gcd clearly must divide UP + VQ. That is, a degree ≥ k polynomial divides a degree < k polynomial, and so we must have UP + VQ = 0, and hence the degree of gcd(P,Q) must be > k by

182 the opening argument of the proof. Conversely, if gcd(P,Q) has degree > k, then, using the setup at the beginning of this proof, we simply let U = C and V = −B so that UP + VQ = 0, which has degree < k. Hence, by Claim A.0.4, PSRCk(P,Q) = 0. This concludes the proof. Proof of Lemma 1.3.6. The Sylvester matrix of P and P 0 is a 2d − 1 square matrix, and so the principal subresultant coefficient of P and P 0 of order d − k is the determinant of the 2d − 1 − 2(d − k) = 2k − 1 square sub-matrix formed from the Sylvester matrix in the usual way. We consider a product of matrices, and show that the determinant of that product is equal the required principal subresultant coefficient. The product

 1 0 ...... 0   .. .. .     0 . . .  a0 a1 . . . a2k−3 a2k−2    . ..   0 a0 a2k−3  . . 1 0 ...... 0       ......   0 ... 0 N0 ...... Nk−1   . . . .  ,      ......   ......   . . . . .   . . . .     . . .  0 ...... 0 a0  0 . . . .  N0 ...... Nk−1 ...... N2k−2 (4) where the ai are defined to be 0 for i > d, has determinant   N0 ...Nk−1 a2k−1 det  . .  = a2k−1δ . 0  . .  0 j Nk−1 ...N2k−2

The top part of the above product is equal to   a0 a1 ...... a2k−1  0 a0 a1 ...... a2k−3    ......   . . . . .  ,    0 0 a0 a1 . . . ak−1   . . . .  . . . . which is easy to see by a straightforward calculation. Indeed, this is the top part of the Sylvester matrix of P and P 0, as its entries are precisely the coefficients of P in the required positions. The bottom part, however, requires the inductive representation of the Ni at the beginning of this subsection. With i ≥ k, the

183 (i, j)-th entry of the product is   aj−1  .   .       a0  0 ... 0 N0 ...Ni−1   = aj+i−2kN0 + ... + a0Nj+i−2k. (5)  0     .   .  0

Applying the inductive representation, we have

aj+i−2kN0 + ... + a0Nj+i−2k = (d − (j + i − 2k))aj+i−2k when j + i − 2k ≤ d, and

adNj+i−2k−d + ... + a0Nj+i−2k = 0 when j + i − 2k ≥ d. To clarify, consider the k-th row (which is the middle row) of the product matrix:

. . . . .  . . . . .   0 ... 0 a0N0 ... (a0Nj−k + ... + aj−kN0) ... (a0Nk−1 + ... + ak−1N0), . . . . .  . . . . . which can be simplified to . . . . .  . . . . .   0 ... 0 da0 ... (d − (j − k))aj−k ... (d − (k − 1))ak−1 . . . . .  . . . . . . . . . .  . . . . .   = 0 ... 0 b0 . . . bi . . . bk−1 . . . . .  . . . . .

The anti-diagonals from the k-th row down are constant, which can be seen in Equation 5 by observing that the indices of the terms in the (i, j)-th entry are

184 constant for fixed i + j. Hence, completing the matrix we have

a0 a1 . . . ak ...... a2k−1  ..   0 a0 a1 ...... a2k−3    ......   ......     0 ... 0 a0 a1 . . . ak     0 ...... 0 b0 . . . bk−1    ,  ......   . . . . .     . . . .   ...... b   d−1   . . . .   0 ......  b0 . . . bk−1 . . . bd−1 ...

0 where bi is equal to (d − i)ai, (the i-th coefficient of P ), which is precisely the inner-most 2k − 1 rows and columns of the Sylvester matrix of P and P 0. 0 2k−1 Therefore PSRCd−k(P,P ) = a0 δk.

Example 42. Let P = X3 + aX2 + bX + c. Then the Newton sums of the roots of P are

N0 = 3

N1 = −a 2 N2 = a − 2b 3 N3 = −a + ab − 3c 4 2 2 N4 = a − 2a b + 2b + 4ac and we have  3 −a a2 − 2b  H(P ) =  −a a2 − 2b −a3 + ab − 3c  , a2 − 2b −a3 + ab − 3c a4 − 2a2b + 2b2 + 4ac with principal minors

δ1 =3 2 δ2 =2a − 6b 6 4 4 4 2 3 2 2 δ3 = − 3a + 3a − 4a b + 2a b − 4a c + 12a b − 6a2b3 − 6abc − 6ab2c − 4b3 − 27c3.

185 The Sylvester matrix of P and P 0 is

1 a b c 0 0 1 a b c   0 0 3 2a b .   0 3 2a b 0 3 2a b 0 0

We consider the matrix product as in Equation 4 for some non-negative integer k < deg(P ). First consider the case of k = 1, so that Equation 4 becomes         N0 a0 = 3 1 , and the principal subresultant coefficient of order d − k = 3 − 1 = 2 is the determinant of the matrix

3 .

0 Clearly we have PSRC2(P,P ) = a0δ1 = 3. Now consider the case of k = 2. Then we have the product     1 0 0 a0 a1 a2  0 N0 N1  0 a0 a1 , N0 N1 N2 0 0 a1

2 3 3 2 whose determinant is (N0N2 − N1 )(a0) = (δ2)(1 ) = 2a − 6b, and the principal subresultant coefficient of order d − k = 1 of P and P 0,

1 a b  0 PSRC1(P,P ) = det 0 3 2a 3 2a b = 2a2 − 6b as expected.

186