arXiv:1611.01246v2 [math.PR] 5 Jun 2017 fteros o omnintersniiiyt etrain ma A perturbation. to of notion sensitivity the their is an sensitivity mention this variables to capturing of of not inf means number further roots, the without the meaningless simply of are than solutions more Numerical by equations. governed is complexity rcs ttmn ntelna ae.Hwvr ti o nw that known place now is first it the However, in case). seeks erro linear one admits multiplicative the solution in large numerical statement a the precise within computing even as number, hard condition the computing below). discussion atal upre yBFgat2128adNFCRE rn DM grant CAREER NSF CCF-1409020. and grant 2010288 NSF grant by CAR NSF supported BSF and by CCF-1409020 grant supported NSF partially by supported partially was A.E. P rbblt ,tenme fcmlxroso a of roots complex [2 of B´ezout’s Theorem number and the Theorem 1, probability Bertini’s (e.g., geometry algebraic h offiin pc,asltl otnoswt epc oLebes to respect statement.) with continuous preceding absolutely space, coefficient the a ffieroots affine has o ytm fplnmasin polynomials of systems for randomization. from profit nu can though one even cer average, in on that, polynomial-time 20]) in complexity. 5, algor done worst-case [2, numerical exponential be e.g., of can (see, solving complexity revealed cal the has of approach analysis probabilistic smoothed and analysis, facrc eesr odsigihtecodntso ot of roots of coordinates the distinguish to necessary accuracy of n admplnma ytm se .. 8 h.5)ta h numb the that 5]) Thm. [8, of e.g., roots (see, separate systems to polynomial random and rifiie codn as according infinite, or n OYOILSSESI RAE AIYO DISTRIBUTIONS OF FAMILY BROADER A I: SYSTEMS POLYNOMIAL =( := aogohrprmtr) oee,i snwkonvaerirwor earlier via known now is it However, parameters). other (among hndsgigagrtm o oyoilsse ovn,i ucl bec quickly it solving, system polynomial for algorithms designing When uteybhn opeiybud noprtn h odto num condition the incorporating bounds complexity behind subtlety A eody xmlslike examples Secondly, is,teeaecasclrdcin hwn htdcdn h exis the deciding that showing reductions classical instruct are an there provides First, roots complex of approximation numerical The RBBLSI ODTO UBRETMTSFRREAL FOR ESTIMATES NUMBER CONDITION PROBABILISTIC p 1 n hc ooiltrscnapa)cnhl ute mrv earlie improve further help can appear) can estimates. terms monomial which syst ing over-determined measur allowing of by family further broader generalize much also Ws a We and allow t Malajovich, earlier. of that Krick, version estimates Cucker, by a probabilistic later for new used — and particular Cucker In by fined coefficients. the of perturbations Abstract. probabilistic p , . . . , nPr I esuysote opeiyadhwsast i h s the (in sparsity how and complexity smoothed study we II, Part In m LEE ERG ALPEREN )  ∈ 2 ecnie h estvt fra ot fplnma ytm with systems polynomial of roots real of sensitivity the consider We − C 2 ons hsealn t s naeaecs nlss ihprobab high analysis, average-case in use its enabling thus bounds, P [ n x − 1 splnma in polynomial is 1 x , . . . , , . . . , n > m P R RGRSPORS N .MUIEROJAS MAURICE J. AND PAOURIS, GRIGORIS UR, 2 =( := ¨ S n − wt each (with ] − m,n 2 0 x  1, 1 ∈ N and − m 1. ( Z x = [ 2 2 Introduction x n  x , n 1 3 ihhg probability high with x , . . . , − 2 − p 2 − n i ,or 1, − 1 aigfie oiiedegree positive fixed having x 1 random , . . . , 3 2 n x , . . . , ]) odto number condition n < m m 3 salready is − 2 n ytmo ooeeu polynomials, homogeneous of system 0 −  − 1 eelta h ubro digits of number the that reveal , − .(n rbblt esr on measure probability (Any 1. x E rn M-111.GP was G.P. DMS-1151711. grant EER suigtececet are coefficients the assuming , n 2 NP , -111.JMR a partially was J.M.R. S-1151711. ecniinnme de- number condition he (2 ems. u esr,wl oi the in do will measure, gue hbr—w establish we — chebor P x hr.Hwvr classical However, -hard. raino h spacing the on ormation n odto number condition r se .. 3 ]adour and 6] [3, e.g., (see, a eepnnilin exponential be may sta considered than es ec fcmlxroots complex of tence − anstig,numeri- settings, tain h odto number condition the ] el sta,with that, us tells 6]) neo restrict- of ense hmtclyelegant thematically tm.I at this fact, In ithms. se .. 1]fra for [15] e.g., (see, v xml fhow of example ive 1)(3 eia ovn has solving merical ro iisneeded digits of er ndiscriminants on k d ,i rvbyas provably is r, ere fthe of degrees d i mscerthat clear omes x epc to respect ,i 0, is ), n − e sthat is ber ),which 1)), Q i n =1 ility d i , 2 ALPEREN ERGUR,¨ GRIGORIS PAOURIS, AND J. MAURICE ROJAS

rational, and the polynomial degrees and coefficient heights are bounded. More simply, a classical observation from the theory of resultants (see, e.g., [7]) is that, for any positive continuous probability measure on the coefficients, P having a root with Jacobian matrix possessing small determinant is a rare event. So, with high probability, small perturbations of a P with no degenerate roots should still have no degenerate roots. More precisely, we review below a version of the condition number used in [27, 2, 20]. Recall that the singular k (n 1) values of a matrix T R × − are the (nonnegative) square roots of the eigenvalues of T ⊤T , ∈ where T ⊤ denotes the transpose of T .

n+di 1 Definition 1.1. Given n, d1,...,dm N and i 1,...,m , let Ni := − and, for any ∈ ∈{ } di homogenous polynomial pi R[x1,...,xn] with deg pi =di, note that the number of monomial ∈ α α1 αn  terms of pi is at most Ni. Letting α := (α1,...,αn) and x := x1 xn , let ci,α denote the α ··· coefficient of x in pi, and set P :=(p1,...,pm). Also, let us define the Weyl-Bombieri norms of pi and P to be, respectively, m c 2 p := | i,α| and P := p 2 . i W (di) W i W k k α1+ +αn=d α k k i=1 k k ··· i r m m r Let ∆m R × be the diagonalP matrix with diagonal entriesP √d1,..., √dm and let ∈ n 1 m n−1 R DP (x) TxS : TxS − denote the linear map between tangent spaces induced by the Jacobian| matrix of the polynomial−→ system P evaluated at the point x. Finally, when m=n 1, − we define the (normalized) local condition number (for solving P = O) to be µ˜norm(P, x) := 1 P σmax DP (x) − n−1 ∆n 1 or µ˜norm(P, x):= , according as DP (x) T Sn−1 is invertible k kW |TxS − ∞ | x or not, where σ (A) is the largest singular value of a matrix A. max  ⋄ n 1 Clearly,µ ˜norm(P, x) as P approaches a system possessing a degenerate root ζ PC− and x approaches ζ. The→∞ intermediate normalizations in the definition are useful for∈ geometric interpretations ofµ ˜norm: There is in fact a natural metric W (reviewed in Section 2 and Theorem 2.1 below), on the space of coefficients, yieldingk·k a simple and elegant algebraic relation between P W , supx Sn−1 µ˜norm(P, x), and the distance of P to a certain discriminant variety (see alsok [11]).k But even∈ more importantly, the preceding condition number (in the special case m = n 1) was a central ingredient in the recent positive solution to Smale’s 17th Problem [2, 20]:− For the problem of numerically approximating a single complex root of a polynomial system, a particular randomization model (independent complex Gaussian coefficients with specially chosen variances) enables polynomial-time average-case complexity, in the face of exponential deterministic complexity.1 1.1. From Complex Roots to Real Roots. It is natural to seek similar average-case speed-ups for the harder problem of numerically approximating real roots of real polynomial systems. However, an important subtlety one must consider is that the number of real roots of n 1 homogeneous polynomials in n variables (of fixed degree) is no longer constant with probability− 1, even if the probability measure for the coefficients is continuous and positive. Also, small perturbations can make the number of real roots of a polynomial system go from positive to zero or even infinity. A condition number for real solving that takes all these subtleties into account was developed in [9] and applied in the seminal series of papers [10, 11, 12]. In these papers, the authors performed a probabilistic analysis assuming the coefficients were independent real Gaussians with mean 0 and very specially chosen variances.

1Here, “complexity” simply means the total number of field operations over C needed to find a start point x0 for Newton iteration, such that the sequence of Newton iterates (xn)n∈N converges to a true root ζ of P n−1 2 (see, e.g., [3, Ch. 8]) at the rate of xn ζ (1/2) x ζ or faster. | − |≤ | 0 − | CONDITION NUMBER ESTIMATES FOR REAL POLYNOMIAL SYSTEMS I 3

P Definition 1.2. [9] Let κ˜(P, x) := k kW and κ˜(P ) := sup κ˜(P, x). We 2 −2 2 √ P µ˜norm(P,x) + P (x) n−1 k kW k k2 x S respectively call κ˜(P, x) and κ˜(P ) the local and global condition numbers for∈ real solving. ⋄ Note that a large condition number for real solving can be caused not only by a root with small Jacobian determinant, but also by the existence of a critical point for P with small corresponding critical value. So a largeκ ˜ is meant to detect the spontaneous creation of real roots, as well as the bifurcation of a single degenerate root into multiple distinct real roots, arising from small perturbations of the coefficients. Our main results, Theorems 3.9 and 3.10 in Section 3.4 below, show that useful condition number estimates can be derived for a much broader class of probability measures than considered earlier: Our theorems allow non-Gaussian distributions, dependence between certain coefficients, and, unlike the existing literature, our methods do not use any additional algebraic structure, e.g., invariance under the unitary group acting linearly on the variables (as in [27, 10, 11, 12]). This aspect also allows us to begin to address sparse polynomials (in the sequel to this paper), where linear changes of variables would destroy sparsity. Even better, our framework allows over-determined systems. To compare our results with earlier estimates, let us first recall a central estimate from [12].

Theorem 1.3. [12, Thm. 1.2] Let P := (p1,...,pn 1) be a random system of homogenous − di α n-variate polynomials where n 3 and pi(x) := α ci,αx where the ci,α are ≥ α1+ +αn=d ··· i independent real Gaussian random variables having meanP 0 qand variance 1. Then, letting n 1 n+di 1 2 5 n 1 n 1 N := − − , d := maxi di, M ′ := 1 + 8d (n 1) N − di, and t n−−1 , i=1 di i=1 4 d − ≥ i=1 i Q we have:P  q Q q √1+log(tM ′) 1. Prob(˜κ(P ) tM ′) t ≥ ≤ 1 2. E(log(˜κ(P ))) log(M ′)+ √log M + . ≤ ′ √log M ′ The expanded class of distributions we allow for the coefficients of P satisfy the following more flexible hypotheses:

n+di 1 Notation 1.4. For any d1,...,dm N and i 1,...,m , let d := maxi di, Ni := − , ∈ ∈{ } di RNi and assume Ci =(ci,α)α1+ +αn=d are independent random vectors in with probability ··· i  distributions satisfying: Ni 1 1. (Centering) For any θ S − we have E Ci, θ =0. ∈ h i N 1 2. (Sub-Gaussian) There is a K > 0 such that for every θ S i− we have t2/K2 ∈ Prob ( Ci, θ t) 2e− for all t> 0. |h i| ≥ ≤ Ni 3. (Small Ball) There is a c0 > 0 such that for every vector a R we have Prob ( a, C ε a ) c ε for all ε> 0. ∈ |h ii| ≤ k k2 ≤ 0 ⋄ By the vectors Ci being independent we simply mean that the probability density function for the longer vector C C can be expressed as a product of the form m f (...,c ,...). 1×···× m i=1 i i,α This is a much weaker assumption than having all the ci,α be independent, as is usually done in the literature on random polynomial systems. Q A simple example of (C1,...,Cm) satisfying the 3 assumptions above would be to simply use an independent mean 0 Gaussian for each ci,α, with each variance arbitrary. This already generalizes the setting of [27, 10, 11, 12] where the variances were specially chosen functions of (d1,...,dn 1) and α. − 4 ALPEREN ERGUR,¨ GRIGORIS PAOURIS, AND J. MAURICE ROJAS

Another example of a collection of random vectors satisfying the 3 assumptions above can be obtained by letting p > 2 and letting Ci have the uniform distribution on N p BNi := x RNi i x 1 for all i: In this case the Sub-Gaussian assumption follows p ∈ | j=1 j ≤ Ni from [1,n Sec. 6] andP the Small Ballo Assumption is a direct consequence of the fact that Bp satisfies Bourgain’s Hyperplane Conjecture (see, e.g., [19]). Yet another important example (easier to verify) is to let the Ci have the uniform distribution on ℓ2 unit-spheres of varying dimension. A simplified summary of our main results (Theorems 3.9 and 3.10 from Section 3.4), in the special case of square dense systems, is the following:

Corollary 1.5. There is an absolute constant A> 0 with the following property. Let P := (p1,...,pn 1) be a random system of homogenous n-variate polynomials where − di α pi(x) := ci,αx and Ci = (ci,α)α1+ +αn=d are independent random vec- α ··· i α1+ +αn=d ··· i tors satisfyingP the Centering,q  Sub-Gaussian and Small Ball assumptions, with underlying n 1 n+di 1 constants c0 and K. Then, for n 3, d := maxi di, d 2, N := − − , and ≥ ≥ i=1 di M A√N Kc 2(n 1) d2 ed 2n 3√n := ( 0) − (3 log( )) − the following bounds hold: P  1 2(n 1) 3t− 2 ; if 1 t (ed) − 1. Prob(˜κ(P ) tM) 1 ≤ ≤ 1 t 4 log(ed) 2(n 1) ≥ ≤  3t− 2 ; if t (ed) −  (ed)2(n−1) ≥ 2. E(logκ ˜(P )) 1 + log M.   ≤ 

Corollary 1.5 is proved in Section 3.4. Theorems 3.9 and 3.10 in Section 3.4 below in fact state much stronger estimates than our simplified summary above. Note that, for fixed d and n, the bound from Assertion (1) of Corollary 1.5 shows a somewhat slower rate of decay for the probability of a large condition number than the older bound from Assertion (1) of Theorem 1.3: O(1/t0.3523) vs. O(√log t/t). However, the older O(√log t/t) bound was restricted to a special family of Gaussian distributions n 1 (satisfying invariance with respect to a natural O(n)-action on the root space PR− ) and assumes m = n 1. Our techniques come from geometric functional analysis, work for a broader family of− distributions, and we make no group-invariance assumptions. Furthermore, our techniques allow condition number bounds in a new setting: over- determined systems, i.e., m n systems with m > n 1. See the next section for the definition of a condition number× enabling m > n 1, and− the statements of Theorems 3.9 and 3.10 for our most general condition number bounds.− The over-determined case occurs in many important applications involving large data, where one may make multiple redun- dant measurements of some physical phenomenon, e.g., image reconstruction from multiple projections. There appear to have been no probabilistic condition number estimates for the case m > n 1 until now. In particular, for m proportional to n, we will see at the end of this paper how− our condition number estimates are close to optimal. To the best of our knowledge, the only other result toward estimating condition numbers of non-Gaussian random polynomial systems is due to Nguyen [22]. However, in [22] the degrees of the polynomials are assumed to be bounded by a small fraction of the number of variables, m=n 1, and the quantity analyzed in [22] is not the condition number considered in [27] or [10, 11,− 12]. CONDITION NUMBER ESTIMATES FOR REAL POLYNOMIAL SYSTEMS I 5

The precise asymptotics of the decay rate for the probability of having a large condition number remain unknown, even in the restricted Gaussian case considered by Cucker, Mala- jovich, Krick, and Wschebor. So we also prove lower bounds for the condition number of a random polynomial system. To establish these bounds, we will need one more assumption on the randomness.

n+di 1 Notation 1.6. For any d1,...,dm N and i 1,...,m , let d := maxi di, Ni := − , ∈ ∈{ } di RNi and assume Ci =(ci,α)α1+ +αn=d is an independent random vector in with probability ··· i  distribution satisfying: 4. (Euclidean Small Ball) There is a constant c˜0 > 0 such that for every ε>0 we have Prob C ε√N (˜c ε)Ni . k ik2 ≤ i ≤ 0 ⋄ Remark 1.7. If the vectors Ci have independent coordinates satisfying the Centering and Small Ball Assumptions, then Lemma 3.4 from Section 3.3 implies that the Euclidean Small Ball Assumption holds as well. Moreover, if the Ci are each uniformly distributed on a convex body X and satisfy our Centering and Sub-Gaussian assumptions, then a result of Jean Bourgain [4] (see also [13] or [18] for alternative proofs) implies that both the Small Ball and Euclidean Small Ball Assumptions hold, and with c˜0 depending only on the Sub-Gaussian constant K (not the convex body X). ⋄ Corollary 1.8. Suppose n, d 3, m = n 1, and d = d for all j 1,...,n 1 . Also let ≥ − j ∈{ − } P := (p1,...,pm) be a random polynomial system satisfying our Centering, Sub-Gaussian, Small Ball, and Euclidean Small Ball assumptions, with respective underlying constants K and c˜ . Then there are constants A A >0 such that 0 2 ≥ 1 A (n log(d)+ d log(n)) E(logκ ˜(P )) A (n log(d)+ d log(n)).  1 ≤ ≤ 2 Corollary 1.8 follows immediately from a more general estimate: Lemma 3.13 from Section 3.3. It would certainly be more desirable to know bounds within a constant multiple ofκ ˜(P ) instead. We discuss more refined estimates of the latter kind in Section 3.5, after the proof of Lemma 3.13. As we close our introduction, we point out that one of the tools we developed to prove our main theorems may be of independent interest: Theorem 2.4 of the next section extends, to polynomial systems, an earlier estimate of Kellog [17] on the norm of the derivative of a single multivariate polynomial.

2. Technical Background We start by defining an inner product structure on spaces of polynomial systems. For α α n-variate degree d homogenous polynomials f(x) := α =d bαx ,g(x) := α =d cαx | | | | ∈ R[x ,...,x ], their Weyl-Bombieri inner product is defined as 1 n P P b c f,g := α α . h iW d α =d α |X| It is known (see, e.g., [21, Thm. 4.1]) that for any U O(n) we have ∈ f U, g U = f,g . h ◦ ◦ iW h iW Let D := (d1,...,dm) and let HD denote the space of (real) m n systems of homogenous n-variate polynomials with respective degrees d ,...,d . Then× for F := (f ,...,f ) H 1 m 1 m ∈ D 6 ALPEREN ERGUR,¨ GRIGORIS PAOURIS, AND J. MAURICE ROJAS

and G := (g1,...,gm) HD we define the Weyl-Bombieri inner product for two polynomial ∈ m systems to be F,G W := i=1 fi,gi W . We also let F W := F, F . A geometrich justificationi for theh definitioni of the conditionk k numberh iκ ˜ can then be derived Pn 1 p as follows: First, for x S − , we abuse notation slightly by also letting DP (x) denote the m n Jacobian matrix∈ of P , evaluated at the point x. For m = n 1 we denote the set of polynomial× systems with singularity at x by − ΣR(x) := P H x is a multiple root of P { ∈ D | } and we then define ΣR (the real part of the disciminant variety for HD) to be: n 1 ΣR := P HD P has a multiple root in S − = x Sn−1 ΣR(x). Using the Weyl-Bombieri{ ∈ inner-product| to define the underlying} dist∈ ance, we point out the following important geometric characterization ofκ ˜: S P Theorem 2.1. [11, Prop. 3.1] When m=n 1 we have κ˜(P )= k kW for all P H .  − Dist(P,ΣR) ∈ D We call a polynomial system P =(p1,...,pm) with m = n 1 (resp. m n) square (resp. over-determined). Newton’s method for over-determined systems− was studied≥ in [14]. So now that we have a geometric characterization of the condition number for square systems it will be useful to also have one for over-determined systems.

Definition 2.2. Let σmin(A) denote the smallest singular value of a matrix A. For any system of homogeneous polynomials P (R[x ,...,x ])m set ∈ 1 n 2 2 1 n−1 L(P, x) := σmin (∆m− DP (x) TxS ) + P (x) 2, P | k k k kW q We then define κ˜(P, x)= L(P,x) and κ˜(P ) = sup κ(P, x). x Sn−1 ⋄ ∈ The quantity min L(P, x) thus plays the role of Dist(P, ΣR) in the more general setting of x Sn−1 m n ∈1. We now recall an important observation from [11, Sec. 2]: ≥ − 1 1 1 n−1 − Setting Dx(P ) := DP (x) TxS we have σmin(∆n− 1Dx(P )) = σmax (Dx(P )− ∆n 1) , when m=n 1 and D (P ) is invertible.| So by the definition− ofµ ˜ (P, x) we have − − x norm 1 2 2 2 2 2 L(P, x)= σmax (Dx(P )− ∆n 1)− + P (x) = P W µ˜norm(P, x)− + P (x) − k k2 k k k k2 and thus our moreq general definition agrees with the classicalq definition in the square case. Since the W -norm of a random polynomial system has strong concentration properties for a broad variety of distributions (see, e.g., [28]), we will be interested in the behavior of L(P, x). So let us define the related quantity (x, y) := ∆ 1D(1)P (x)(y) 2 + P (x) 2. L k m− k2 k k2 It follows directly that L(P, x) = inf (x, y). q y x L y S⊥n−1 ∈ We now recall a classical result of O. D. Kellog. The theorem below is a summary of [17, Thms. 4–6]. R Theorem 2.3. [17] Let p [x1,...,xn] have degree d and set p := supx Sn−1 p(x) and (1) ∈(1) k k∞ ∈ | | D p := maxx,u Sn−1 D p(x)(u) . Then: ∈ k k∞ (1) | 2 | n 1 (1) We have D p d p and, for any mutually orthogonal x, y S − , we also k k∞ ≤ k k∞ ∈ have D(1)p(x)(y) d p . | | ≤ k k∞ (2) If p is homogenous then we also have D(1)p d p .  k k∞ ≤ k k∞ m For any system of homogeneous polynomials P := (p1,...,pm) (R[x1,...,xn]) define m 2 ∈ P := supx Sn−1 i=1 pi(x) . Let DP (x)(u) denote the image of the vector u under k k∞ ∈ pP CONDITION NUMBER ESTIMATES FOR REAL POLYNOMIAL SYSTEMS I 7 the linear operator DP (x), and set

m (1) 2 D P := sup DP (x)(u) 2 = sup pi(x),u . n−1k k n−1 v h∇ i ∞ x,u S x,u S u i=1 ∈ ∈ uX t Theorem 2.4. Let P := (p ,...,p ) (R[x ,...,x ])m be a polynomial system with p 1 m ∈ 1 n i homogeneous of degree di for each i and set d:=maxi di. Then: (1) 2 n 1 (1) We have D P d P and, for any mutually orthogonal x, y S − , we also k k∞ ≤ k k∞ ∈ have DP (x)(y) 2 d P . k k ≤ k k∞ (1) (2) If deg(pi)= d for all i 1,...,m then we also have D P d P . ∈{ } k k∞ ≤ k k∞ (1) Proof. Let (x0,u0) be such that D P = DP (x0)(u0) 2 and let α := (α1,...,αm) where p (x0),u0 k k∞ k k h∇ i i R αi := D(1)P . Note that α 2 = 1. Now define a polynomial q [x1,...,xn] of degree d ∞ k k ∈ via q(xk) := αk p (x)+ + α p (x) and observe that 1 1 ··· m m ∂p ∂p ∂p ∂p q(x)= α 1 + + α m ,...,α 1 + + α m , ∇ 1 ∂x ··· m ∂x 1 ∂x ··· m ∂x  1 1 n n  ∂p ∂p ∂p ∂p q,u = u α 1 + + α m + + u α 1 + + α m , h∇ i 1 1 ∂x ··· m ∂x ··· n 1 ∂x ··· m ∂x  1 1   n n  and q(x),u = m α p (x),u . In particular, for our chosen x and u , we have h∇ i i=1 ih∇ i i 0 0 P m m 2 pi(x0),u0 (1) q(x0),u0 = αi pi(x0),u0 = h∇ i = D P . h∇ i h∇ i D(1)P ∞ i=1 i=1 X X k k∞ Using the first part of Kellog’s Theorem we have

D(1)P sup q(x),u d2 q . k k∞ ≤ x,u Sn−1|h∇ i| ≤ k k∞ ∈ Now we observe by the Cauchy-Schwarz Inequality that

m m 2 q = sup αipi(x) sup pi(x) . k k∞ x Sn−1 ≤ x Sn−1 v ∈ i=1 ∈ u i=1 X uX (1) 2 2 t m 2 2 So we conclude that D P d q d sup x Sn−1 i=1 pi(x) = d P . We also note that when deg(pk)= d fork∞ all≤ i,k thek∞ polynomial≤ ∈q is homogenous of degreek dk.∞ So for this i p special case, the second part of Kellog’s Theorem directly impliesP D(1)P d P . k k∞ ≤ p k(x),yk∞ h∇ i i For the proof of the first part of Assertion (1) we define αi = DP (x)(y) and k k2 q(x)= α1p1 + + αnpn. Then q(x),y = i αi pi(x),y = DP (x)(y) 2. By applying Kellog’s··· Theorem onh∇ the orthogonali directionh∇ y wei thenk obtaink P  DP (x)(y) 2 = q(x),y d q d P . k k h∇ i ≤ k k∞ ≤ k k∞ Using our extension of Kellog’s Theorem to polynomial systems, we develop useful esti- mates for P and D(i)P . In what follows, we call a subset of a metric space X a δ-net on Xk ifk and∞ onlyk if thek∞ every point of X is within distance δNof some point of . A basic fact we’ll use repeatedly is that, for any δ > 0 and compact X, one can alwaysN find a finite δ-net for X. 8 ALPEREN ERGUR,¨ GRIGORIS PAOURIS, AND J. MAURICE ROJAS

m Lemma 2.5. Let P := (p1,...,pm) (C[x1,...,xn]) be a system of homogenous poly- n 1 ∈ nomials, a δ-net on S − , and set d := maxi di. Let max (P ) := supy P (y) 2. N (k) (k) N ∈N k k k+1 Similarly let us define max (D P ) := supx,u1,...,uk D P (x)(u1,...,uk) 2, and set (k) N (k) ∈N k k D P := sup n−1 D P (x)(u ,...,u ) . Then: x,u1,...,uk S 1 k 2 ∞ ∈ max (D(k)P ) maxN (P ) (k) N k+1 (1) P 1 δd2 and D P 1 δd2√k+1 . k k∞ ≤ k k∞ ≤ (2) If deg(p )=−d for each i 1,...,m −then we have i ∈{ } (k) max (P ) (k) max k+1 (D P ) P N and D P N . k k∞ ≤ 1 δd k k∞ ≤ 1 δd√k +1 − − n 1 Proof. We first prove Assertion (2). Observe that the Lipschitz constant of P on S − is (1) n 1 bounded from above by D p : This can be seen by taking x, y S − and considering k 1 k∞ ∈ the integral P (x) P (y)= 0 DP (y + t(x y))(x y) dt. Since y + t (x− y) 1 for all t [0,−1], the homogeneity− of the system P implies k · − k2 ≤R ∈ (1) DP (y + t(x y))(x y) 2 D P x y 2 k − − k ≤k k∞k − k (1) Using our earlier integral formula, we conclude that P (x) P (y) 2 D P x y 2. k − k ≤k k∞k − k Now, when the degrees of the pi are identical, let the Lipschitz constant of P be M. By (1) n 1 Assertion (2) of Theorem 2.4 we have M D P d P . Let x0 S − be such ≤ k k∞ ≤ k k∞ ∈ that P (x0) 2 = P and let y satisfy x0 y δ. Then P = P (x0) 2 k k k k∞ ∈ N | − | ≤ k k∞ k k ≤ P (y) 2 + x0 y 2M max (P )+ δd P , and thus k k k − k ≤ N k k∞ (⋆) P (1 dδ) max P (x). x k k∞ − ≤ ∈N (k) To bound the norm of D P (x)(u1,...,uk) let us consider the net defined by k+1 n 1 n 1 n 1 n 1 = on S − S − . Let x := (x ,...,x ) S − S − and N×···×N N ×···× 1 k+1 ∈ ×···× y := (y ,...,y ) k+1 be such that x y δ for all i. Clearly, x y δ√k + 1. 1 k+1 ∈N k i − ik2 ≤ k − k2 ≤ Since x was arbitrary, this argument proves that k+1 is a δ√k +1-net. Note also that (k) N D P (x)(u1,...,uk) is a homogenous polynomial system with (k +1)n variables and degree d. The desired bound then follows from Inequality (⋆) obtained above. To prove Assertion (1) of our current lemma, the preceding proof carries over verbatim, simply employing Assertion (1), instead of Assertion (2), from Theorem 2.4. 

3. Condition Number of Random Polynomial Systems

3.1. Introducing Randomness. Now let P := (p1,...,pm) be a random polynomial sys-

dj α n+dj 1 tem where p (x) := c x . In particular, recall that N = − and we j α =dj j,α α j d | | j RNj let Cj =(cj,α) α =d be a randomq vector in satisfying the Centering, Sub-Gaussian, and | | j P   dj α Small Ball assumptions from the introduction. Letting j := α x we then have X α =d q | | j pj(x) = Cj, j . In particular, recall that the Sub-Gaussian assumption  is that there is h X i N 1 t2/K2 a K > 0 such that for each θ S j − and t> 0 we have Prob ( C , θ t) 2e− . ∈ |h j i| ≥ ≤ Recall also that the Small Ball assumption is that there is a c0 > 0 such that for every vector a RNi and ε>0 we have Prob ( a, C ε a ) c ε. In what follows, several of ∈ |h ji| ≤ k k2 ≤ 0 our bounds will depend on the parameters K and c0 underlying the random variable being Sub-Gaussian and having the Small Ball property. CONDITION NUMBER ESTIMATES FOR REAL POLYNOMIAL SYSTEMS I 9

For any random variable ξ on R we denote its median by Med(ξ). Now, if ξ := Cj, θ , |h 1 i| then setting t := 2K in the Sub-Gaussian assumption for Cj yields Prob(ξ 2K) 2 , i.e., 1 ≥ ≤ Med(ξ) 2K. On the other hand, setting ε := in the Small Ball assumption for Cj ≤ 2c0 yields Prob(ξ 1 ) 1 , i.e., Med(ξ) 1 . Writing 1 = Med(ξ) 1 we then easily ≤ 2c0 ≤ 2 ≥ 2c0 · Med(ξ) obtain 1 (1) Kc . 0 ≥ 4 In what follows we will use Inequality (1) several times. 3.2. The Sub-Gaussian Assumption and Bounds Related to Operator Norms. . We will need the following inequality, reminiscent of Hoeffding’s classical inequality [16]. Theorem 3.1. [28, Prop. 5.10] There is an absolute constant c>0 with the following prop- erty: If X1,...,Xn are Sub-Gaussian random variables with mean zero and underlying con- stant K, and a =(a ,...,a ) Rn and t 0, then 1 n ∈ ≥ 2 ct  Prob aiXi t 2 exp − 2 . ≥ ! ≤ K2 a ! i k k2 X Lemma 3.2. Let P := (p 1,...,pm ) be a random polynomial system where, as before,

dj α pj(x)= α =d cj,α α x and the the coefficient vectors Cj are independent random vec- | | j tors satisfying the Centering,q Sub-Gaussian, and Small Ball assumptions from the introduc- P  n 1 tion, with underlying constants K and c0. Then, for a δ-net over S − and t 2, we have the following inequalities: N ≥ (1) If deg(p )= d for all j 1,...,m then j ∈{ } 2tK√m O(t2m) Prob P 1 2 e− k k∞ ≤ 1 dδ ≥ − |N|  −  1 In particular, there is a constant c1 1 such that for δ = 3d and t = s log(ed) with 2 ≥ c1s m log(ed) s 1 we have Prob( P 3sK√m log(ed)) 1 e− . ≥ k k∞ ≤ ≥ − (2) If d := maxj deg pj then

2tK√m O(tm) Prob P 2 1 2 e− k k∞ ≤ 1 d δ ≥ − |N|  −  1 In particular, there is a constant c2 1 such that for δ = 3d2 , t = s log(ed) with 2 ≥ c2s m log(ed) s 1, we have Prob ( P 3sK√m log(ed)) 1 e− . ≥ k k∞ ≤ ≥ − Proof. We prove Assertion (2) since the proofs of the two assertions are virtually identical. 2 2 d d 2α First observe that the identity (x1 + + xn) = α =d α x implies j 2 = 1 for all ··· | | kX k j m. Using our Sub-Gaussian assumption on the random vectors Cj, and the fact that ≤ P t2/K  n 1 pj(x)= Cj, j , we obtain that Prob( pj(x) t) 2e− for every x S − . Now weh needX i to tensorize the preceding| | inequality. ≥ ≤ By Theorem 3.1,∈ we have for all m 1 ct2/K2 m 1 a S − that Prob( a, P (x) t) 2e− . Letting be a δ-net on S − we ∈ |h i| ≥ ≤ ct2/K2 M then have Prob (maxa a, P (x) t) 2 e− , where we have used the classical union bound for the∈M|h multiplei| events ≥ ≤ defined|M| by the (finite) δ-net . Since M P (x) 2 = maxθ Sm−1 θ,P (x) , an application of Lemma 2.5 for the linear polynomial k k ∈ |h i| t√mK ct2m ,P (x) gives us Prob P (x) 2 1 δ 2 e− . h· i k k ≥ − ≤ |M|   10 ALPEREN ERGUR,¨ GRIGORIS PAOURIS, AND J. MAURICE ROJAS

m 1 3 m It is known that for any δ> 0, S − admits a δ-net such that δ (see, e.g, 2 1 M |M| ≤ c2t m [28, Lemma 5.2]). So for t 1 and δ = we have Prob ( P (x) 2t√mK) 2e− for ≥ 2 k k2 ≥ ≤ some suitable constant c2 c. We have thus arrived at a point-wise estimate on P (x) 2. ≥ n 1 k k Doing a union bound on a δ-net now on S − we then obtain: N

2 c1t m Prob max P (x) 2t√mK 2 e− . x k k2 ≥ ≤ |N |  ∈N  Using Lemma 2.5 once again completes our proof.  Theorem 2.4 and Lemma 3.2 then directly imply the following: Corollary 3.3. Let P be a random polynomial system as in Lemma 3.2. Then there are constants c1,c2 1 such that the following inequalities hold for s 1: ≥ (1)≥ (1) If deg(pj)= d for all j 1,...,m then both Prob D P 3sK√md log(ed) ∞ 2 (2) ∈{ 2 } k k ≤ c1s m log(ed) and Prob D P 3sK√md log(ed) are bounded from below by 1 2e− . k k∞ ≤ (1) 2−  (2) If d := maxj deg pj then both Prob D P 3sK√md log(ed) and  ∞ 2 (2) 4 k k ≤ c2s m log(ed) Prob D P 3sK√md log(ed) are bounded from below by 1 2e− .  k k∞ ≤ −  3.3. The Small Ball Assumption and Bounds for L(P ). We will need the following standard lemma (see, e.g., [23, Lemma 2.2] or [29]).

Lemma 3.4. Let ξ1,...,ξm be independent random variables such that, for every ε> 0, we have Prob ( ξ ε) c ε. Then there is a constant c>˜ 0 such that for every ε> 0 we have | i| ≤ ≤ 0 Prob ξ2 + + ξ2 ε√m (˜cc ε)m.  1 ··· m ≤ ≤ 0 We canp then derive the following result:

Lemma 3.5. Let P =(p1,...,pm) be a random polynomial system, satisfying the Small Ball assumption with underlying constant c0. Then there is a constant c>˜ 0 such that for every n 1 m ε> 0 and x S − we have Prob( P (x) ε√m) (˜cc ε) . ∈ k k2 ≤ ≤ 0 Proof. By the Small Ball assumption on the random vectors Ci, and observing that n 1 pi(x) = Ci, i and i 2 = 1 for all x S − , we have Prob( pi(x) ε) c0ε. By Lemma 3.4h weX arei done.kX k ∈ | | ≤ ≤ The next lemma is a variant of [22, Claim 2.4]. The motivation for the technical statement below, which introduces new parameters α,β,γ, is that it is the crucial covering estimate needed to prove a central probability bound we’ll need later: Theorem 3.7.

Lemma 3.6. Let n 2, let P := (p1,...,pm) be a system of n-variate homogenous polynomi- ≥ n 1 als, and assume P γ. Let x, y S − be mutually orthogonal vectors with (x, y) α, k k∞ ≤ ∈ 2 Ln ≤ and let r [ 1, 1]. Then for every w with w = x + βry + β z for some z B2 , we have the following∈ − inequalities: ∈ 4 2 2 4 4 4 2 (1) If d := maxi di and 0 < β d− then P (w) 2 8(α +(2+ e )β d γ ). ≤ k 2 k ≤ 2 2 4 4 4 2 (2) If deg(p )= d for all i [m], and 0 < β d− then P (w) 8(α +(2+ e )β d γ ). i ∈ ≤ k k2 ≤ Proof. We will prove just Assertion (1) since the proof of Assertion (2) is almost the same. We start with some auxiliary observations on P : First note that Theorem 2.4 tells us k k∞ that P γ implies D(1)P d2γ and, similarly, D(k)P d2kγ for every k 1. k k∞ ≤ k n 1 k∞ ≤ k k∞ ≤ ≥ Also, for any w and ui S − with i 1,...,k , P γ and the homogeneity of the ∈ ∈ { } k k∞ ≤ CONDITION NUMBER ESTIMATES FOR REAL POLYNOMIAL SYSTEMS I 11

(k) d k 2k − pi implies supu1,...,uk D P (w)(u1,...,uk) 2 w 2 d γ. These observations then yield k k ≤2 k k n 1 the following inequality for w = x + βry + β z with z B2 , r 1, β d− , k = 3, and n 1 ∈ | | ≤ ≤ u1,u2,u3 S − : ∈ (3) d 3 6 d 3 6 D P (w)(u1,u2,u3) 2 w 2− d γ (1+2β) − d γ Now, by Taylor expansion,k we have thek following≤k k equality:≤

1 p (w)= p (x)+ p (x),βry +β2z + (βry +β2z)T D(2)p (x)(βry +β2z)+(1+ β)3 β3A (x), j j h∇ j i 2 j j 2 where A (x) := 1 D(3)p (x + t v v)(v,v,v)dt and v = βry+β z . j 0 j k k2 βry+β2z Breaking the second and third order terms of the expansionk ofk p (w) into pieces, we then R j have the following inequality:

1 1 p (w) p (x) +β p (x),y +β2 p (x), z + β2 D(2)p (x)(y,y) + β3 D(2)p (x)(y, z) | j |≤| j | |h∇ j i| |h∇ j i| 2 | j | 2 | j | 1 1 + β3 D(2)p (x)(z,y) + β4 D(2)p (x)(z, z) +(1+ β)3β3 A (x) . 2 | j | 2 | j | | j | 1 2 Applying the Cauchy-Schwarz Inequality to the vectors (1,βdj , 1, 1, 1, 1, 1, 1) and 1 ( p (x) ,d− 2 p (x),y ,..., (1 + β)3β3 A (x) ) then implies the following inequality: | j | j |h∇ j i| | j | 2 2 2 1 2 4 2 1 4 (2) 2 p (w) (7 + β d )(p (x) + d− p (x),y + β p (x), z + β (D p (x)(y,y)) j ≤ j j j h j i h∇ j i 4 j j

1 1 1 + β6 D(2)p (x)(y, z) 2 + β6 D(2)p (x)(z,y) 2 + β8 D(2)p (x)(z, z) 2 + β6(1 + β)6A (x)2) 4 | j | 4 | j | 4 | j | j 2 We sum all these inequalities for j 1,...,m . On the left-hand side we have P (w) 2. On ∈{ } 2 1 k k 2 the right-hand side, the summation of the terms pj(x) + dj− pj(x),y is 2 1 (1) 2 h i P (x) 2 + M − D P (x)(y) 2, and its magnitude is controlled by the assumption k (x, yk) α.k The summationsk of the other terms are controlled by the assumption P γ Land Theorem≤ 2.4. Summing all the inequalities for j 1,...,m , we have k k∞ ≤ ∈{ }

2 2 2 1 (1) 2 4 4 2 1 4 4 2 P (w) (7 + β d)( P (x) + M − D P (x)(y) + β d γ + β d γ k k2 ≤ k k2 k k2 4 1 1 1 + β6d6γ2 + β6d6γ2 + β8d8γ2 + β6(1 + β)6 A (x)2) 4 4 4 j j X 4 8 8 4 4 6 6 4 4 The assumption β d− implies that β d β d and β d β d . Therefore, ≤ ≤ ≤

2 2 2 1 (1) 2 4 4 2 4 4 2 6 6 2 P (w) (7+β d)( P (x) + M − D P (x)(y) +β d γ +β d γ +β (1+β) A (x) ). k k2 ≤ k k2 k k2 j j X 2 (3) 2 2d 6 12 2 Clearly Aj(x) maxw Vx,y D P (w)(u1,u2,u3) (1+2β) − d γ . Hence we j m ∈ 2 ≤ 2 ≤2 2 4k 4 2 4 4 2 k ≤2d 6 12 2 4 have P (w) (7 + β d)(α + β d γ + β d γ +(1+2β) β d γ ). Since β d− , we k P k2 ≤ ≤ finally get P (w) 2 (7 + β2d)(α2 +(2+ e4)β4d4γ2) 8(α2 +(2+ e4)β4d4γ2).  k k2 ≤ ≤ 12 ALPEREN ERGUR,¨ GRIGORIS PAOURIS, AND J. MAURICE ROJAS

Lemma 3.6 controls the growth of the norm of the polynomial system P = (p1,...,pm) Rn 2 n 1 n over the region w : w = x + βry + β z, r 1,y S − ,y x, z B2 . Note in particular that we{ are∈ using cylindrical neighborhoods| | ≤ instead∈ of ball⊥ neighborhoods.∈ } This is because we have found that (a) our approach truly requires us to go to order 3 in the underlying Taylor expansion and (b) cylindrical neighborhoods allow us to properly take contributions from tangential directions, and thus higher derivatives, into account. We already had a probabilistic estimate in Lemma 3.5 that said that for any w with m w 2 1, the probability of P (w) 2 being smaller than ε√m is less than ε up to some universalk k ≥ constants. The controlledk k growth provided by Lemma 3.6 holds for a region with a certain volume, which will ultimately contradict the probabilistic estimates provided by Lemma 3.5. This will be the main trick behind the proof of the following theorem. Theorem 3.7. Let n 2 and let P := (p ,...,p ) be a system of random homogenous n- ≥ 1 m di a variate polynomials such that pj(x)= a =d cj,a a x where Cj =(cj,a) a =dj are random | | j | | vectors satisfying the Small Ball assumption withq underlying constant c0. Let α,γ > 0, P6 2  d := maxi di, and assume α γ min d− ,d /n . Then ≤ { } m 3 +m n 2 n 3 Cc0 Prob(L(P ) α) Prob ( P γ)+ α 2 − √n(γd ) − 2 ≤ ≤ k k∞ ≥ √m   where C is a universal constant.

4 Proof. We assume the hypotheses of Assertion (1): Let α,γ > 0 and β d− . Let B : = P P γ and let ≤ { |k k∞ ≤ } n 1 L := P L(P ) α = P There exist x, y S − with x y and (x, y) α . { 2 | ≤4 4} 4 {2 | n ∈ ⊥ Rn L ≤ } Let Γ := 8(α +(5+ e )β d γ ) and let B2 denote the unit ℓ2-ball in . Lemma 3.6 implies that if the event B L occurs then there exists a set ∩ V := w Rn : w = x + βry + β2z, r 1, z y, z Bn Bn x,y { ∈ | | ≤ ⊥ ∈ 2 }\ 2 2 such that P (w) 2 Γ for every w in this set. Let V := Vol(Vx,y). Note that for w Vx,y k 2 k ≤ 2 2 2 2 2 ∈ we have w 2 = x + β z 2 + βy 2 1+4β . Hence we have w 2 1+2β . Since k k 2 kn n k k k ≤ k k ≤ Vx,y (1+2β )B2 B2 , we have showed that ⊆ \ 2 n n B L P Vol( x (1+2β )B2 B2 P (x) 2 Γ ) V . Using Markov’s∩ Inequality,⊆{ | Fubini’s{ ∈ Theorem, and\ Lemma|k 3.5,k we≤ can} es≥timate} the proba- bility of this event. Indeed, Prob (Vol( x (1+2β2)Bn Bn : P (x) Γ ) V ) { ∈ 2 \ 2 k k2 ≤ } ≥ 1 EVol x (1+2β2)Bn Bn : P (x) 2 Γ ≤ V { ∈ 2 \ 2 k k2 ≤ } 1 2  Prob P (x) 2 Γ dx ≤ V (1+2β2)Bn Bn k k ≤ Z 2 \ 2 2 n n  Vol((1 + 2β )B2 B2 ) 2 \ max Prob P (x) 2 Γ . ≤ V x (1+2β2)Bn Bn k k ≤ ∈ 2 \ 2 n/2 n ′  n π Vol(B2 ) c Now recall that Vol(B )= . Then n−1 for some constant c′ > 0. Having 2 Γ n +1 Vol(B ) √n ( 2 ) 2 ≤ assumed that β2 1 we obtain (1+2β2)n 1+2nβ2, and we see that ≤ n ≤ 2 n n n 2 n Vol((1 + 2β )B2 B2 ) Vol(B2 )((1+2β ) 1) 2 2n c√nββ − , \ 2 n 1 n 1− V ≤ β(β ) − Vol(B2 − ) ≤ CONDITION NUMBER ESTIMATES FOR REAL POLYNOMIAL SYSTEMS I 13 for some absolute constant c> 0. Note that here, for a lower bound on V , we used the fact 2 that Vx,y contains more than half of a cylinder with base having radius β and height 2β. Writingx ˜ := x for any x = 0 we then obtain, for z / Bn, that x 2 2 k k 6 ∈ m m m P (z) 2 = p (z) 2 = p (˜z) 2 z 2dj p (˜z) 2 = P (˜z) 2. k k2 | j | | j | k k2 ≥ | j | k k2 j=1 j=1 j=1 X X X This implies, via Lemma 3.5, that for every w (1+2β2)Bn Bn we have ∈ 2 \ 2 m 2 2 Γ Prob P (w) 2 Γ Prob P (˜w) 2 Γ cc0 . k k ≤ ≤ k k ≤ ≤ rm!   So we conclude that m 2 2n Γ Prob(L(P ) α) Prob ( P γ)+Prob(B L) Prob ( P γ)+ c√nββ − cc0 . ≤ ≤ k k∞ ≥ ∩ ≤ k k∞ ≥ m 2 4 4 4 2 2 α 6 2 Recall that Γ = 8(α +(5+e )β d γ ). Setting β := 2 , our assumption α γ min d−q,d/n γd ≤ { } and our choice of β then imply that Γ = Cα2 for some constant C. So we obtain

3 n m α 2 − Cc α Prob(L(P ) α) Prob( P γ)+ c√n 0 ≤ ≤ k k∞ ≥ γd2 √m     and our proof is complete. 

3.4. The Condition Number Theorem and its Consequences. We will now need bounds for the Weyl-Bombieri norms of polynomial systems. Note that, with

dj α pj(x)= α cj,αx , α1+ +αn=d ··· j q we have pj W := (cj,α)α 2 for j 1P,...,m . The following lemma, providing large deviationk estimatesk k for thek Euclidean∈ { norm, is} standard and follows, for instance, from Theorem 3.1.

Lemma 3.8. There is a universal constant c′ >0 such that for any random n-variate polyno- mial system P =(p1,...,pm) satisfying the Centering and Sub-Gaussian assumptions, with n+dj 1 m underlying constant K, j 1,...,m , Nj := − , N := Nj, and t 1, we have ∈{ } dj j=1 ≥ 2 t Nj (1) Prob pj W c′tK Nj e−  P k k ≥ ≤ t2N (2) Prob P c′tK√N e− . k kW ≥ p ≤   We are now ready to prove our main theorem on the condition number of random poly- nomial systems. Theorem 3.9. There are universal constants A,c>0 such that the following hold: Let P =

dj α (p1,...,pm) be a system of homogenous random polynomials with pj(x)= α =d cj,α α x | | j P q  and let Cj =(cj,α) α =d be independent random vectors satisfying the Sub-Gaussian and Small Ball | | j assumptions, with respective underlying constants K and c0. Assume n 2 and let d := maxj deg pj. 3 ≥ m n− 2 3 3 1 N m−n+ 2 2 m−n+ 2 2m−2n+3 6 n Then, setting M := m (Kc0C) (3d log(ed)) n max d , d2 , we have two cases: q  14 ALPEREN ERGUR,¨ GRIGORIS PAOURIS, AND J. MAURICE ROJAS

(1) If N m log(ed) then Prob(˜κ(P ) tM) is bounded from above by ≥ ≥ m log(ed) 3 m−n+ 3 3 if 1 t e 2 tm−n+ 2 3 ≤ ≤  n− 2 3 m log(ed) N 3 (m n+ ) log t 2 3 3  − 2 m−n+ 2 m−n+ 2  m−n+ 3 if e t e  t 2 m log(ed)  3 ≤ ≤  m n− 2   3  N 3 (m n+ ) log t 2 N 2 3 − 2 m−n+ 2 m−n+ 3 N m log(ed) if e t  t 2 ≤  (2) If N m log(ed) then Prob(˜κ(P ) tM) is bounded from above by ≤ ≥ N 3 m−n+ 3 3 if 1 t e 2 m−n+ 2 t m ≤ ≤ 3 N  3 (m n+ ) log t 2 3 − 2 m−n+ 2  m−n+ 3 N if e t  t 2 ≤   P W Proof. Recall thatκ ˜(P )= k k . Note that if u> 0, and the inequalities P W ucK√N L(P ) k k ≤ and L(P ) ucK√N hold, then we clearly haveκ ˜(P ) tM. In particular, u>0 implies that ≥ tM ≤ √ √ ucK N Prob(˜κ(P ) tM) Prob P W ucK N + Prob L(P ) . ≥ ≤ k k ≥ ≤ tM !   Our proof will then reduce to optimizing u over the various domains of t. Toward this end, note that Lemma 3.8 provides a large deviation estimate for the Weyl norm of our polynomial system. So, to bound Prob P ucK√N from above, we need k kW ≥ to use Lemma 3.8 with the parameter u. As for the other summand in the upper bound for Prob(˜κ(P ) tM), Theorem 3.7 provides an upper bound for Prob L(P ) ucK√N . ≥ ≤ tM However, the upper bound provided by Theorem 3.7 involves the qua ntity Prob ( P  γ). k k∞ ≥ Therefore, in order to bound Prob L(P ) ucK√N , we will need to use Theorem 3.7 to- ≤ tM   ucK√N gether with Lemma 3.2. In particular, we will set α := tM and γ := 3sK√m log(ed) in Theorem 3.7 and Lemma 3.2, and then optimize the parameters u, s, and t at the final step of the proof. Now let us check if the assumptions of Theorem 3.7 are satisfied: We have that s 1, 6 d2 ≥ u 1, and (since α min d− , γ) we have ≥ ≤ n n o 2 ucK√N 6 d 3sK√m log(ed) min d− , . tM ≤ n   uc√N 6 d2 So 3stM min d− , and we thus obtain √m log(ed) ≤ n 3 n o m n− 2 1 uc N N m−n+ 3 2 m−n+ 3 ( ) 3st (Kc0C) 2 (3d log(ed)) 2 n 2m−2n+3 . ∗ log(ed)rm ≤ rm 1 Since Kco 4 , the inequality ( ) holds if u s, t 1, and we take the constant C from Theorem 3.7≥ to be at least 4. Under∗ the preceding≤ ≥ restrictions we then have that Q := Prob(˜κ(p) tM) implies ≥ 3 +m n 2 − m √ 3 2 2 ucK N 2 n Cc0 c2s m log(ed) u N Q √n(3sK√m log(ed)d ) − 2 + e− + e− ≤ tM √m !   CONDITION NUMBER ESTIMATES FOR REAL POLYNOMIAL SYSTEMS I 15

3 3 m−n+ n− 2 2 u 2 s 2 c2s m log(ed) u N or, Q 3 + e− + e− for some suitable c2 >0. ≤ tm−n+ 2 c1m log(ed) 3 We now consider the case where N m log(ed). If1 t e m−n+ 2 then we set u=s=1, noting that ( ) is satisfied. We then obtain≥ ≤ ≤ ∗ 1 c2m log(ed) N 3 Q 3 + e− + e− 3 , ≤ tm−n+ 2 ≤ tm−n+ 2 m log(ed) N m−n+ 3 m−n+ 3 provided c2 1. In the case where e 2 t e 2 we choose u = 1 and s := (m n+ 3 ) log t ≥ ≤ ≤ − 2 1. (Note that u s). These choices then yield m log(ed) ≥ ≤ q n− 3 n 3 2 3 − 2 3 2 1 (m n + 2 ) log t 1 N 3 (m n + 2 ) log t Q − + +e− − . m n+ 3 c (m n+ 3 ) m n+ 3 ≤ 2 m log(ed) 2 2 ≤ 2 m log(ed) t −   t − t −   N 3 3 3 (log t)(m n+ ) (m n+ ) log t In the case where e m−n+ 2 t, we choose s := − 2 and u := − 2 . ≤ m log(ed) N (Note that u s also in this case). So we get q q ≤ 3 m n− 2 1 (m n + 3 ) log t 2 N 2 1 1 Q − 2 + + m n+ 3 c (m n+ 3 ) m n+ 3 ≤ 2 N m log(ed) 2 2 2 t −     t − t − 3 m n− 3 2 3 (m n + ) log t 2 N 2 − 2 . m n+ 3 ≤ 2 N m log(ed) t −     N 3 We consider now the case where N m log(ed). When 1 t e m−n+ 2 we choose s =1 1 ≤ c2m log(ed) N ≤ 3 ≤ and u = 1 to obtain Q 3 + e− + e− 3 as before. In the case ≤ tm−n+ 2 ≤ tm−n+ 2 N 3 3 (m n+ ) log t t e m−n+ 2 , we choose s = u := − 2 . Note that again ( ) is satisfied and, with ≥ N ∗ these choices, we get q m 1 (m n + 3 ) log t 2 1 1 Q − 2 + + m n+ 3 c (m n+ 3 )m log(ed)/N m n+ 3 ≤ 2 N 2 2 2 t −   t − t − m 3 (log t)(m n + 3 ) 2 − 2 . m n+ 3 ≤ 2 N t −   

Theorem 3.10. Let P be a random polynomial system as in Theorem 3.9, let d:=maxj deg pj, and let M be as defined in Theorem 3.9. Set

3 n− 2 3 2 q√πn n 2 1 δ1 := 3 − n and 2 m n + 2 2em log(ed) q −   1 m n+ 3 − − 2

3   m−n+ 2 m m 2 q√πme− 2 δ2 := m . N 3 q 2 n 1 m n q ed 2   + 2 1 m n+ 3 (log( )) − − − − − 2 We then have the following estimates:    16 ALPEREN ERGUR,¨ GRIGORIS PAOURIS, AND J. MAURICE ROJAS

(1) If N m log(ed) and q (0, m n + 3 ) then ≥ ∈ − 2 1 q q 1 q (E(˜κ(P ) )) q M 1+ + δ + δ . ≤ m n q +2 1 2  − −  1 q 1 3m log(ed) q In particular, q 0, (m n + 3 ) 1 1 = (E(˜κ(P ) )) q M , ∈ − 2 − 2 log(ed) ⇒ ≤ n m n+ 3 q 1 and q 0, − 2 = (E(˜κ(P ) )) q 41/qM.i   ∈ 2 ⇒ ≤ Furthermore, E(logi κ ˜(P )) 1 + log M. ≤ 1 q 1 q E q q (2) If N m log(ed), then ( (˜κ(P ) )) M 1+ 3 + δ2 . m n q+ 2 ≤ ≤ − − 1  q 1 3m log(ed) q In particular, q 0, (m n + 3 ) 1 m = (E(˜κ(P ) )) q M and ∈ − 2 − eN ⇒ ≤ n m n+ 3 q 1 q 0, − 2 = (E(˜κ(P ) )) q 41/qM.   ∈ 2 ⇒ ≤ Furthermore, Ei(logκ ˜(P )) 1 + log M. ≤ n− 3 3 2 m n− 2 m n+ 3 2 m n+ 3 2 2 − 2 − 2 N Proof. Set Λ1 := m log ed ,Λ2 := N m log ed ,       5 m log ed N r := m n q + , a := , and a := . − − 2 1 m n + 3 2 m n + 3 − 2 − 2 Note that we have r 1 by construction. Using Theorem 3.9 and the formula ≥ q ∞ q 1 E((˜κ(P )) )= q t − Prob(˜κ(p) t) dt ≥ Z0 (which follows from the definition of expectation), we have that q q ∞ q 1 E((˜κ(P )) ) M 1+ q t − Prob(˜κ(p) tM) dt , ≤ 1 ≥  Z n− 3  q ea1 ea2 2 m E((˜κ(P )) ) 1 (log t) 2 ∞ (log t) 2 or q 1+ q r dt + qΛ1 r dt + qΛ2 r dt. We will give M ≤ 1 t ea1 t ea2 t upper bounds for the lastZ three integrals.Z First note that Z

ea1 1 q (r 1)a1 q q dt = 1 e − . tr r 1 − ≤ r 1 Z1 − − Also, we have that 

3 a n− 2 e 2 a2 3 a2(r 1) 3 (log t) 2 n− 2 qΛ1 − n− 2 2 (r 1)t 2 t qΛ1 dt = qΛ1 t e − dt = n t e− dt a r 2 e 1 t a1 (r 1) a1(r 1) Z Z − Z − n 3 3 2 − 4 qΛ1 n 1 q√πn n 2 1 n 1 Γ 3 − n 1 . ≤ 2 4 2 − 4 ≤ m n + 2em log(ed) 2 4 (r 1) −   2   q − − − 1 m n+ 3 − − 2 Finally, we check that   m ∞ (log t) 2 ∞ m qΛ2 ∞ m 2 (r 1)t 2 t qΛ2 dt = qΛ2 t e − dt = m +1 t e− dt a r 2 e 2 t a2 (r 1) a2(r 1) Z Z − Z − CONDITION NUMBER ESTIMATES FOR REAL POLYNOMIAL SYSTEMS I 17

m n 3 3 2 2 4 qΛ2 m q√πm m(m n + 2 ) N − m +1 Γ +1 3 m +1 − ≤ (r 1) 2 2 ≤ (m n q + ) 2 eN m log ed −   − − 2     m n + 3 m/2 m 2 − 2 4 1 qe− √πm 1 = m 3 n 3 . 2 N q · m n + 2 q · (log(ed)) 2 − 4   1 m n+ 3 − − − − 2   Note that if q (m n + 3 ) 1 1 then δ , δ 1. ≤ − 2 − 2 log(ed) 1 2 ≤ For the case N m log(ed), working as before, we get that ≤ q ea2 m E((˜κ(P )) ) 1 ∞ (log t) 2 q q 1+ q r dt + qΛ2 r dt 1+ + δ2. M ≤ 1 t ea2 t ≤ r 1 Z Z m − √πmq m 2 1 In the case N m log(ed) we have δ2 3 m . In particular, for this m n+ eN 2 +1 ≤ ≤ − 2 q 1 3   − m−n+ 2  case, it easily follows that q (m n + 3 ) 1 m implies δ 1.  ≤ − 2 − N 2 ≤ Note that if m = n 1, n 3, and d 2, then N m log(ed) and, in this case, it is easy − ≥ ≥ ≥ 6 n to check that ( ) still holds even if we reduce M by deleting its factor of max d , d2 . So then, for the important∗ case m = n 1, our main theorems immediately admit the following refined form: −  Corollary 3.11. There are universal constants A,c> 0 such that if P is any random poly- nomial system as in Theorem 3.9, but with m = n 1, n 3, d := maxj deg pj, d 2, and 2(n 1) 2 2n 3 − ≥ ≥ M := √N(Kc0C) − (3d log(ed)) − √n instead, then we have: 1 2(n 1) log (ed) 3t− 2 if 1 t e − 3 ≤ ≤ n− 2 1 log t 2  2 2(n 1) log (ed) 2N Prob(˜κ(P ) tM) 3t− 2(n 1) log (ed) if e − t e ,  − ≤ ≤ ≥ ≤  n− 3  1 2 1    log t 4 log t 2 2 2N 3t− 2N 2(n 1) log (ed) if e t − ≤  q 1 1 and, for all q 0, 1 1 , we have (E(˜κ(P ) )) q  Me q . ∈ 2 − 4 log (ed) ≤ Furthermore, E(logκ ˜(P )) 1i + log M.  ≤ We are now ready to prove Corollary 1.5 from the introduction. Proof of Corollary 1.5: From Corollary 3.11, Bound (2) follows immediately, and Bound (1) 2(n 1) log(ed) is clearly true for the smaller domain of t. So let us now consider t = xe − with x 1. n 3 n 3 n 3 ≥ 2 4 2 4 2 4 log x 1 log t − log x − log t − 4 log(ed) 4 log(ed) Clearly, 2(n 1) log(ed) = 1+ 2(n 1) log(ed) , and thus 2(n 1) log(ed) < e = x . − − − n− 3      2  1 1 log t 2 1 t 2 2 t 4 log(ed) Since x = e2(n−1) log(ed) we thus obtain 3t− 2(n 1) log(ed) 3t− e2(n−1) log(ed) . − ≤ Renormalizing the pair (M, t) (since the M from Corollary 3.11 is larger than the M from Corollary 1.5 by a factor of A), we are done.  3.5. On the Optimality of Condition Number Estimates. As mentioned in the intro- duction, to establish a lower bound we need one more assumption on the randomness. For the convenience of the reader, we recall our earlier Euclidean Small Ball assumption. (Euclidean Small Ball) There is a constant c˜ > 0 such that for each j 1,...,m and 0 ∈{ } ε>0 we have Prob C ε N (˜c ε)Nj . k jk2 ≤ j ≤ 0 p  18 ALPEREN ERGUR,¨ GRIGORIS PAOURIS, AND J. MAURICE ROJAS

We will need an extension of Lemma 3.4: Lemma 3.12 below (see also [25, Thm. 1.5 & Cor. 8.6]). Toward this end, for any matrix T := (ti,j)1 i,j m, write T HS for the Hilbert-Schmidt norm of T and T for the operator norm of T ≤, i.e.,≤ k k k kop 1 m 2 2 T HS := ti,j and T op := max T θ 2. k k k k θ Sn−1 k k i,j=1 ! ∈ X Lemma 3.12. Let ξ ,...,ξ be independent random variables satisfying Prob (ξ ε) c ε 1 m i ≤ ≤ 0 for all i 1,...,m and ε> 0. Let ξ := (ξ1,...,ξm). Then there is a constant c> 0 such ∈{ } 2 kT kHS c 2 that for any m m matrix T and ε>0 we have Prob ( Tξ ε T ) (cc ε) kT kop .  × k k2 ≤ k kHS ≤ 0 Our main lower bound for the condition number is then the following:

Lemma 3.13. Let P = (p1,...,pm) be a homogeneous n-variate polynomial system with P W dj = deg pj for all j. Then κ˜(P ) k k . Moreover, if P := (p1,...,pm) is a random ≥ P ∞√m+1 polynomial system satisfying our Sub-Gaussiank k and Euclidean Small Ball assumptions, with respective underlying constants K and c˜0, then we have

min N √N c′ min N j j ,md log(ed)  maxj Nj  Prob κ˜(P ) ε (cc˜0ε) and ≤ Kmd log(ed)! ≤

√N c′m log(ed) Prob κ˜(P ) ε (cc˜0ε) , if dj =d for all j 1,...,m , ≤ Km log(ed)! ≤ ∈{ } where c,c′ > 0 are absolute constants. In particular when d = d for all j 1,...,m , we j ∈{ } have E(˜κ(P )) c √N . ≥ m log(ed) n 1 Proof. First note that Theorem 2.3 implies that for every x, y S − we have ∈ 1 (1) 2 2 dj− D pj(x)y 2 pj . k k ≤k k∞ 1 (1) 2 m 2 2 So we have M − D P (x)(y) 2 j=1 pj m P . Now recall that k k ≤ k k∞ ≤ k k∞ 2(x, y) := M 1D(1)P (x)(y) 2 + p(x) 2. P − 2 2 So we get L2(P ) := min L2(x, y) (km + 1) P 2 , whichk ink turnk implies that x y ∞ ⊥ L ≤ k k P P κ˜(P ) k kW k kW . ≥ L(P ) ≥ P √m +1 k k∞ The proof for the case where dj = d for all j 1,...,m is identical. We now show that, under our Euclidean∈{ Small} Ball Assumption, we have that min N cN j j maxj Nj Prob P W ε√N (cc˜0ε) for every ε (0, 1). Indeed, recall that pj W = Cj Nj . k k ≤ ≤ ∈ k k k kℓ2 Nj Nj0 Then Prob pj W ε Nj (˜c0ε) (˜c0ε) for any fixed ε (0, 1), where j0 1,...,m k k ≤ ≤ ≤pj W ∈ ∈{ } satisfies Nj0 := minj Nj. Let ξj := k k for any j 1,...,m . Set ξ := (ξ1, ,ξm) p  √Nj ∈{ } ··· and T := diag(√N , , √N ). Note that P = Tξ , T = m N = √N, and 1 ··· m k kW k k2 k kHS j=1 j minj Nj q cN max N T op := max1 j m Nj. Then Lemma 3.12 implies Prob P W ε√NP (cc˜0ε) j j . k k ≤ ≤ k k ≤ ≤ Recall that Lemma 3.2 implies that for every t 1 we have  p ≥ t2m log(ed) Prob p ctK√m log(ed) e− . k k∞ ≥ ≤  CONDITION NUMBER ESTIMATES FOR REAL POLYNOMIAL SYSTEMS I 19

So using our lower bound estimate for the condition number, we get

P c′ε√N cε√N Prob k kW Prob κ˜(P ) , P ≥ tK√m log(ed) ≤ ≥ tKmd log(ed) k k∞ ! ! cε√N Prob P W c′ε√N P ctK√m log(ed) Prob κ˜(P ) , {k k ≥ }∩{k k∞ ≤ } ≤ ≥ tKmd log(ed)!   and

minj Nj 2 cN max N t m log(ed) Prob P W c′ε√N P ctK√m log(ed) 1 (cc˜0ε) j j e− {k k ≥ }∩{k k∞ ≤ } ≥ − −   1 We may choose t := log ε and, by adjusting constants, we get our result. The case where dj = d for all j 1,...,mq is similar. The bounds for the expectation follow by integration.  ∈{ }

Observe that the dominant factor in the very last estimate of Lemma 3.13 is √N, which is the normalization coming from the Weyl-Bombieri norm of the polynomial system. So it makes sense to seek the asymptotic behavior of κ˜(P ) . When m = n 1, the upper bounds we √N − get are exponential with respect to n, while the lower bounds are not. But when m =2n 3 − and d = dj for all j 1,...,m , we have the following upper bound (by Theorem 3.10) and lower bound (by Theorem∈{ 3.13):} A E(˜κ(P )) A log ed max d8, n 1 2 { }, nd log(ed) ≤ √N ≤ √n where A1, A2 are constants depending on (K,c0). This suggests that our estimates are closer to optimality when m is a constant multiple of n.

Remark 3.14. There are similarities between our probability tail estimates and the older estimates in the linear case studied in [24]. In particular our estimates in the quadratic case d = 2, when m is a constant multiple of n, are quite similar to the optimal result (for the linear case) appearing in [24]. ⋄ 4. Acknowledgements We would like to thank the anonymous referee for detailed remarks, especially for the comment that helped us to spot a mistake in the previous proof of Theorem 3.7.

References [1] Franck Barthe and Alexander Koldobsky, “Extremal slabs in the cube and the Laplace transform,” Adv. Math. 174 (2003), pp. 89–114. [2] Carlos Beltr´an and Luis-Miguel Pardo, “Smale’s 17th problem: Average polynomial time to compute affine and projective solutions,” Journal of the American Mathematical Society 22 (2009), pp. 363–385. [3] , Felipe Cucker, Mike Shub, and Steve Smale, Complexity and Real Computation, Springer- Verlag, 1998. [4] Jean Bourgain, “On the isotropy-constant problem for ψ2-bodies,” in Geometric Aspects of Functional Analysis, Lecture Notes in , vol. 1807, pp. 114–121, Springer Heidielberg, 2003. [5] Peter Burgisser and Felipe Cucker, “On a problem posed by Steve Smale,” Annals of Mathematics, pp. 1785–1836, Vol. 174 (2011), no. 3. 20 ALPEREN ERGUR,¨ GRIGORIS PAOURIS, AND J. MAURICE ROJAS

[6] Peter Burgisser and Felipe Cucker, Condition, Grundlehren der mathematischen Wissenschaften, no. 349, Springer-Verlag, 2013. [7] John F. Canny, The Complexity of Robot Motion Planning, ACM Doctoral Dissertation Award Series, MIT Press, 1987. [8] D. Castro, Juan San Mart´ın, Luis M. Pardo, “Systems of Rational Polynomial Equations have Polynomial Size Approximate Zeros on the Average,” Journal of Complexity 19 (2003), pp. 161–209. [9] Felipe Cucker, “Approximate zeros and condition numbers,” Journal of Complexity 15 (1999), no. 2, pp. 214–226. [10] Felipe Cucker, Teresa Krick, Gregorio Malajovich, and Mario Wschebor, “A numerical algorithm for zero counting I. Complexity and accuracy,” J. Complexity 24 (2008), no. 5–6, pp. 582–605 [11] Felipe Cucker, Teresa Krick, Gregorio Malajovich, and Mario Wschebor, “A numerical algorithm for zero counting II. Distance to ill-posedness and smoothed analysis,” J. Fixed Point Theory Appl. 6 (2009), no. 2, pp. 285–294. [12] Felipe Cucker, Teresa Krick, Gregorio Malajovich, and Mario Wschebor, “A numerical algorithm for zero counting III: Randomization and condition,” Adv. in Appl. Math. 48 (2012), no. 1, pp. 215–248. [13] Nikos Dafnis and Grigoris Paouris, “Small ball probability estimates, Ψ2-behavior and the hyperplane conjecture,” Journal of Functional Analysis 258 (2010), pp. 1933–1964. [14] Jean-Piere Dedieu, Mike Shub, “Newton’s Method for Overdetermined Systems Of Equations,” Math. Comp. 69 (2000), no. 231, pp. 1099–1115. [15] James Demmel, Benjamin Diament, and Gregorio Malajovich, “On the Complexity of Computing Error Bounds,” Found. Comput. Math. pp. 101–125 (2001). [16] Wassily Hoeffding, “Probability inequalities for sums of bounded random variables,” Journal of the American Statistical Association, 58 (301):13–30, 1963. [17] O. D. Kellog, “On bounded polynomials in several variables,” Mathematische Zeitschrift, December 1928, Volume 27, Issue 1, pp. 55–64. [18] Bo’az Klartag and Emanuel Milman, “Centroid bodies and the logarithmic Laplace Transform – a unified approach,” J. Func. Anal., 262(1):10–34, 2012. [19] Alexander Koldobsky and Alain Pajor, “A Remark on Measures of Sections of Lp-balls,” Geometric Aspects of Functional Analysis, Lecture Notes in Mathematics 2169, pp. 213–220, Springer-Verlag, 2017. [20] Pierre Lairez, “A deterministic algorithm to compute approximate roots of polynomial systems in poly- nomial average time,” Foundations of Computational Mathematics, DOI 10.1007/s10208-016-9319-7. [21] Eric Kostlan, “On the Distribution of Roots of Random Polynomials,” Ch. 38 (pp. 419–431) of From Topology to Computation: Proceedings of Smalefest (M. W. Hirsch, J. E. Marsden, and M. Shub, eds.), Springer-Verlag, New York, 1993. [22] Hoi H. Nguyen, “On a condition number of general random polynomial systems,” Mathematics of Computation (2016) 85, pp. 737–757 [23] Mark Rudelson and Roman Vershynin, “The Littlewood-Offord Problem and Invertibility of Random Matrices,” Adv. Math. 218 (2008), no. 2, pp. 600–633. [24] Mark Rudelson and Roman Vershynin, “The Smallest Singular Value of Rectangular Matrix,” Commu- nications on Pure and Applied Mathematics 62 (2009), pp. 1707–1739. [25] Mark Rudelson and Roman Vershynin, “Small ball Probabilities for Linear Images of High-Dimensional Distributions,” Int. Math. Res. Not. (2015), no. 19, pp. 9594–9617. [26] Igor R. Shafarevich, Basic Algebraic Geometry 1: Varieties in Projective Space, 3rd edition, Springer- Verlag (2013). [27] Mike Shub and Steve Smale, “Complexity of Bezout’s Theorem I. Geometric Aspects,” J. Amer. Math. Soc. 6 (1993), no. 2, pp. 459–501. [28] Roman Vershynin, “Introduction to the Non-Asymptotic Analysis of Random Matrices,” Compressed sensing, pp. 210–268, Cambridge Univ. Press, Cambridge, 2012. n (1+ε)n [29] Assaf Naor and Artem Zvavitch, “Isomorphic embedding of ℓp , 1

Technische Universitat¨ Berlin, Institut fur¨ Mathematik, Sekretariat MA 3-2, Strae des 17. Juni 136, 10623, Berlin, Germany E-mail address: [email protected] Department of Mathematics, Texas A&M University TAMU 3368, College Station, Texas 77843-3368, USA. E-mail address: [email protected] Department of Mathematics, Texas A&M University TAMU 3368, College Station, Texas 77843-3368, USA. E-mail address: [email protected]