Linear Discriminant Functions

Total Page:16

File Type:pdf, Size:1020Kb

Linear Discriminant Functions Why Linear? 4. Linear Discriminant •It is simple and intuitive. •Minimizing a criterion error; e.g. sample risk, Functions training error, margin, etc. •Can be generalized to find non-linear discriminant Aleix M. Martinez regions. [email protected] •It is generally very difficult to calculate the distance of a testing sample to a nonlinear function. Limited number of training samples. HandoutsHandouts for for ECE ECE 874, 874, 2007. 2007. •No need to estimate class distributions. Distance to a nonlinear function Linear Discriminant Analysis •If we have samples corresponding to two or more classes, we prefer to select those features that best discriminate between classes –rather than those that best describe the data. •This will, of course, depend on the classifier. •Assume our classifier is Bayes. •Thus, we want to minimize the probability of error. •We will develop a method based on scatter matrices. From Murase & Nayar, 1995. Theorem •Let the samples of two classes be Normally distributed in Rp, with common covariance matrix. Then, the Bayes errors in the p-dimensional space and in the one-dimensional subspace given by 1 1 v (1 - 2)/|| (1 - 2) ||, are the same; where || x || is the Euclidean norm of PCA the vector x. •That is, there is no loss in classification when LDA reducing from p dimensions to one. 1 Scatter matrices and •To formulate criteria for class separability, separability criteria we need to convert these matrices to numerical values: tr(S 1S ) •Within-class scatter matrix: 2 1 C N ln | S | ln | S | T 1 2 SW xij j xij j . j1 i1 tr S1 tr S •Between-class scatter matrix: 2 C T SB j j . •Typical combination of scatter matrices are: j1 {S , S } {S , S },{S ,ˆ}, and{S ,ˆ}. ˆ 1 2 B W B W •Note that: SW SB . Ronald Fisher (1890-1962) A solution to LDA • Fisher was an eminent scholar and one of the great scientist of the first part of the 20th century. After graduating from Cambridge and being denied entry to the British army for his poor eyesight, he worked as a statistician for six years •Again, we want to minimize the Bayes error. before starting a farming business. While a farmer, he continued his genetics and statistics research. During this time, he developed the well-known analysis •Therefore, we want the projection from Y to of variance (ANOVA) method. After the war, Fisher finally moved to Rothamsted Experimental Station. Among his many accomplishments, Fisher invented ANOVA, the technique of maximum likelihood (ML), Fisher X that minimizes the error: p n ˆ Information, the concept of sufficiency, and the method now known as Linear X( p) yii bii . Discriminant Analysis(LDA). During World War II, the filed of eugenics i1 ip1 suffered a big blow -- mainly do to the Nazis use of it as a justification for •The eigenvalue decomposition is the optimal some of their actions. Fisher moved back to Rothamsted and then to Cambridge wherehe retired. Fisher has been accredited to be one of the transformation: 1 founders of modern statistics and one cannot study pattern recognition without SB SW i ii . encountering several of his ground-breaking insights. Yet as great as a statistician that he was, he also become a major figure in genetics. A classical quote in the Annals of Statistics reads “I occasionally meet geneticists who Simultaneous diagonalization. ask me whether it is true that the great geneticist R.A. Fisher was also an important statistician." Example: Face Recognition 2 Limitations of LDA PCA versus LDA •To preventS to become singular, N>d. W •In many applications the number of samples is •There are only C-1 nonzero eigenvectors. relatively small compared to the dimensionality of • the data. •Even for simple PDFs, PCA can outperform LDA (testing data). •Again, this limits the number of features one can use. •PCA is usually a guarantee, because all we try to •Nonparametric LDA is design to solve the do is to minimize the representation error. last problem (we’ll see this latter in the course). Problems with Multi-class Eigen-based Algorithms •In general, researchers define algorithms which are optimal in the 2-classes case and then extend this idea (way of thinking) to the multi-class problem. •This may caused problems. underlying but unknown PDFs •This is the case for eigen-based approaches which use the idea of scatter matrices defined above. •Let’s define the general case:M1V M 2V. •This is the same as selecting those T eigenvectors v that maximize: v M1v T v M 2 v •Note that this can only be achieved if M1 and M2 agree. •The existence of solution depends on the angle between the eigenvectors of M and 1 th vi is the i basis vector of the solution space. M2. wi are the eigenvectors of M1. ui are the eigenvectors of M2. 3 Classification: The linear case How to know? r i r i 2 T 2 K cosij u j wi . i1 j1 i1 j1 where r < q and q is the number of eigenvectors of M1. •The larger K is, the less probable that the results will be correct. Decision Surfaces Discriminant function = distance •A linear discriminant function can be threshold •The discriminant function gives an mathematically written as: T algebraic measure of the distance. g(x) w x wo. •Take two vectors x1 and x2, both on the •2-class case: weight vector decision boundary. Then, write x as: –Decide w if g(x) 0. 1 w g(x) – w2 if g(x) 0. x x p r r . T w w •We can also do that with: w1 if w x wo . Projection of x onto g(x)=0. threshold Distance from x to g(x)=0. Multicategory case •Two straightforward algorithms: –Reduce the problem to C-1 2-class problems. –Construct C(C-1)/2 linear discriminant functions. •A linear machine assigns x to: wi if gi (x) g j (x) j i. •The decision boundaries (between two adjacent areas) are given by: gi (x) g j (x). 4 Linearly Separable Linear Classifier •Our previous algorithm is general and can •If the classes (or training data) are linearly be applied even when the classes are not separable, then there exist a unique gi(x)>0. linearly separable. •This means that a new sample t, can be •C classes are linearly separable iff for every classified as wi if gi(t)>0. wi there exist a linear classifier (hyperplane) •Alternatively (when the data is not linearly such that all the sample of wi lie on its separable), we can use: positive side (gi(x)>0) and all the sample of wi (t | w1,..., wC , w01,..., w0C ) arg maxi gi (t). wj (j i) are on the negative side of gi(x). weights thresholds It is simpler to start Linear Regression with regression. •Remember that regression is a related (but different) problem to that of classification. •We search that function g(x) which best interpolates a given training set S={(x1,y1), p …, (xn,yn)}; wherexi and yi are scalars. w, x wT x, and we have •We want to minimize: assumed wo=0. f (x, y)y g(x) y w, x 0. •If enough training data is available there T •Now, differentiating with respect to w, we exist a unique solution: T T T T w X y. get:2y X 2w XX 0, •When there is noise (i.e.,there does not exist wT XXT yT XT . a g() for which f()=0), we use least squares 1 •If the inverse exist, then: w XXT Xy. (LS). •When there is less samples than dimensions •The (LS) error function is given by: n n (n<p), there exist many possible solutions 2 2 E(w,(X,y)) yi g(xi ). for w ill-condition problem. i1 i1 •We need to impose a restriction (or bias) to •Using norm-2 LS: favor one solution over the rest. This is T 2 y wT X y wT X . known as regularization. 2 5 Ridge Regression •Note that we could have also written w as a function of the inputs X: 1 T •We re-formulate the problem as: n w XX w yX, 2 2 T min Ew,X,ymin w yi g(xi ). w w X w y, i1 •Differentiating with respect to the XT Xy, parameters, we get: T X X I y, T T T T T T T n w XX w w XX I p y X 1 G In y. 1 primal solution. T T w XX I p Xy. •G=X X is the Gram matrix: G ij xi ,x j . •Dual solution: •Note that for any given , we choose that n w x . solution which minimizes the norm of w. i i i1 A look back at PCA Generalized Linear Discriminant •Remember that to compute the PCs of a •We can rewrite our linear discriminant as: distribution when n<p, we also used the d ~ T 3 3 g(x) w w x . dual option; i.e.Q X X. (n ) vs.( p ). 0 i1 i i •The two arguments (PCA and LS) are •We can now extend this to a quadratic form: d d d g(x) w w x w x x . equivalent. 0 i1 i i i1 j1 ij i j •To see this, remember that to minimize Ux •Or to any other polynomial discriminant 2 T T using LS, we have E Ux x U Ux. function: g(x) aT y •The eigenvector associated to the smallest dˆ eigenvalue of UTU minimizes E.
Recommended publications
  • Solving Cubic Polynomials
    Solving Cubic Polynomials 1.1 The general solution to the quadratic equation There are four steps to finding the zeroes of a quadratic polynomial. 1. First divide by the leading term, making the polynomial monic. a 2. Then, given x2 + a x + a , substitute x = y − 1 to obtain an equation without the linear term. 1 0 2 (This is the \depressed" equation.) 3. Solve then for y as a square root. (Remember to use both signs of the square root.) a 4. Once this is done, recover x using the fact that x = y − 1 . 2 For example, let's solve 2x2 + 7x − 15 = 0: First, we divide both sides by 2 to create an equation with leading term equal to one: 7 15 x2 + x − = 0: 2 2 a 7 Then replace x by x = y − 1 = y − to obtain: 2 4 169 y2 = 16 Solve for y: 13 13 y = or − 4 4 Then, solving back for x, we have 3 x = or − 5: 2 This method is equivalent to \completing the square" and is the steps taken in developing the much- memorized quadratic formula. For example, if the original equation is our \high school quadratic" ax2 + bx + c = 0 then the first step creates the equation b c x2 + x + = 0: a a b We then write x = y − and obtain, after simplifying, 2a b2 − 4ac y2 − = 0 4a2 so that p b2 − 4ac y = ± 2a and so p b b2 − 4ac x = − ± : 2a 2a 1 The solutions to this quadratic depend heavily on the value of b2 − 4ac.
    [Show full text]
  • Warm Start for Parameter Selection of Linear Classifiers
    Warm Start for Parameter Selection of Linear Classifiers Bo-Yu Chu Chia-Hua Ho Cheng-Hao Tsai Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science National Taiwan Univ., Taiwan National Taiwan Univ., Taiwan National Taiwan Univ., Taiwan [email protected] [email protected] [email protected] Chieh-Yen Lin Chih-Jen Lin Dept. of Computer Science Dept. of Computer Science National Taiwan Univ., Taiwan National Taiwan Univ., Taiwan [email protected] [email protected] ABSTRACT we may need to solve many optimization problems. Sec- In linear classification, a regularization term effectively reme- ondly, if we do not know the reasonable range of the pa- dies the overfitting problem, but selecting a good regulariza- rameters, we may need a long time to solve optimization tion parameter is usually time consuming. We consider cross problems under extreme parameter values. validation for the selection process, so several optimization In this paper, we consider using warm start to efficiently problems under different parameters must be solved. Our solve a sequence of optimization problems with different reg- aim is to devise effective warm-start strategies to efficiently ularization parameters. Warm start is a technique to reduce solve this sequence of optimization problems. We detailedly the running time of iterative methods by using the solution investigate the relationship between optimal solutions of lo- of a slightly different optimization problem as an initial point gistic regression/linear SVM and regularization parameters. for the current problem. If the initial point is close to the op- Based on the analysis, we develop an efficient tool to auto- timum, warm start is very useful.
    [Show full text]
  • Neural Networks and Backpropagation
    CS 179: LECTURE 14 NEURAL NETWORKS AND BACKPROPAGATION LAST TIME Intro to machine learning Linear regression https://en.wikipedia.org/wiki/Linear_regression Gradient descent https://en.wikipedia.org/wiki/Gradient_descent (Linear classification = minimize cross-entropy) https://en.wikipedia.org/wiki/Cross_entropy TODAY Derivation of gradient descent for linear classifier https://en.wikipedia.org/wiki/Linear_classifier Using linear classifiers to build up neural networks Gradient descent for neural networks (Back Propagation) https://en.wikipedia.org/wiki/Backpropagation REFRESHER ON THE TASK Note “Grandmother Cell” representation for {x,y} pairs. See https://en.wikipedia.org/wiki/Grandmother_cell REFRESHER ON THE TASK Find i for zi: “Best-index” -- estimated “Grandmother Cell” Neuron Can use parallel GPU reduction to find “i” for largest value. LINEAR CLASSIFIER GRADIENT We will be going through some extra steps to derive the gradient of the linear classifier -- We’ll be using the “Softmax function” https://en.wikipedia.org/wiki/Softmax_function Similarities will be seen when we start talking about neural networks LINEAR CLASSIFIER J & GRADIENT LINEAR CLASSIFIER GRADIENT LINEAR CLASSIFIER GRADIENT LINEAR CLASSIFIER GRADIENT GRADIENT DESCENT GRADIENT DESCENT, REVIEW GRADIENT DESCENT IN ND GRADIENT DESCENT STOCHASTIC GRADIENT DESCENT STOCHASTIC GRADIENT DESCENT STOCHASTIC GRADIENT DESCENT, FOR W LIMITATIONS OF LINEAR MODELS Most real-world data is not separable by a linear decision boundary Simplest example: XOR gate What if we could combine the results of multiple linear classifiers? Combine two OR gates with an AND gate to get a XOR gate ANOTHER VIEW OF LINEAR MODELS NEURAL NETWORKS NEURAL NETWORKS EXAMPLES OF ACTIVATION FNS Note that most derivatives of tanh function will be zero! Makes for much needless computation in gradient descent! MORE ACTIVATION FUNCTIONS https://medium.com/@shrutijadon10104776/survey-on- activation-functions-for-deep-learning-9689331ba092 Tanh and sigmoid used historically.
    [Show full text]
  • Elements of Chapter 9: Nonlinear Systems Examples
    Elements of Chapter 9: Nonlinear Systems To solve x0 = Ax, we use the ansatz that x(t) = eλtv. We found that λ is an eigenvalue of A, and v an associated eigenvector. We can also summarize the geometric behavior of the solutions by looking at a plot- However, there is an easier way to classify the stability of the origin (as an equilibrium), To find the eigenvalues, we compute the characteristic equation: p Tr(A) ± ∆ λ2 − Tr(A)λ + det(A) = 0 λ = 2 which depends on the discriminant ∆: • ∆ > 0: Real λ1; λ2. • ∆ < 0: Complex λ = a + ib • ∆ = 0: One eigenvalue. The type of solution depends on ∆, and in particular, where ∆ = 0: ∆ = 0 ) 0 = (Tr(A))2 − 4det(A) This is a parabola in the (Tr(A); det(A)) coordinate system, inside the parabola is where ∆ < 0 (complex roots), and outside the parabola is where ∆ > 0. We can then locate the position of our particular trace and determinant using the Poincar´eDiagram and it will tell us what the stability will be. Examples Given the system where x0 = Ax for each matrix A below, classify the origin using the Poincar´eDiagram: 1 −4 1. 4 −7 SOLUTION: Compute the trace, determinant and discriminant: Tr(A) = −6 Det(A) = −7 + 16 = 9 ∆ = 36 − 4 · 9 = 0 Therefore, we have a \degenerate sink" at the origin. 1 2 2. −5 −1 SOLUTION: Compute the trace, determinant and discriminant: Tr(A) = 0 Det(A) = −1 + 10 = 9 ∆ = 02 − 4 · 9 = −36 The origin is a center. 1 3. Given the system x0 = Ax where the matrix A depends on α, describe how the equilibrium solution changes depending on α (use the Poincar´e Diagram): 2 −5 (a) α −2 SOLUTION: The trace is 0, so that puts us on the \det(A)" axis.
    [Show full text]
  • Lecture 2: Linear Classifiers
    Lecture 2: Linear Classifiers Andr´eMartins Deep Structured Learning Course, Fall 2018 Andr´eMartins (IST) Lecture 2 IST, Fall 2018 1 / 117 Course Information • Instructor: Andr´eMartins ([email protected]) • TAs/Guest Lecturers: Erick Fonseca & Vlad Niculae • Location: LT2 (North Tower, 4th floor) • Schedule: Wednesdays 14:30{18:00 • Communication: piazza.com/tecnico.ulisboa.pt/fall2018/pdeecdsl Andr´eMartins (IST) Lecture 2 IST, Fall 2018 2 / 117 Announcements Homework 1 is out! • Deadline: October 10 (two weeks from now) • Start early!!! List of potential projects will be sent out soon! • Deadline for project proposal: October 17 (three weeks from now) • Teams of 3 people Andr´eMartins (IST) Lecture 2 IST, Fall 2018 3 / 117 Today's Roadmap Before talking about deep learning, let us talk about shallow learning: • Supervised learning: binary and multi-class classification • Feature-based linear classifiers • Rosenblatt's perceptron algorithm • Linear separability and separation margin: perceptron's mistake bound • Other linear classifiers: naive Bayes, logistic regression, SVMs • Regularization and optimization • Limitations of linear classifiers: the XOR problem • Kernel trick. Gaussian and polynomial kernels. Andr´eMartins (IST) Lecture 2 IST, Fall 2018 4 / 117 Fake News Detection Task: tell if a news article / quote is fake or real. This is a binary classification problem. Andr´eMartins (IST) Lecture 2 IST, Fall 2018 5 / 117 Fake Or Real? Andr´eMartins (IST) Lecture 2 IST, Fall 2018 6 / 117 Fake Or Real? Andr´eMartins (IST) Lecture 2 IST, Fall 2018 7 / 117 Fake Or Real? Andr´eMartins (IST) Lecture 2 IST, Fall 2018 8 / 117 Fake Or Real? Andr´eMartins (IST) Lecture 2 IST, Fall 2018 9 / 117 Fake Or Real? Can a machine determine this automatically? Can be a very hard problem, since fact-checking is hard and requires combining several knowledge sources ..
    [Show full text]
  • Hyperplane Based Classification: Perceptron and (Intro
    Hyperplane based Classification: Perceptron and (Intro to) Support Vector Machines Piyush Rai CS5350/6350: Machine Learning September 8, 2011 (CS5350/6350) Hyperplane based Classification September8,2011 1/20 Hyperplane Separates a D-dimensional space into two half-spaces Defined by an outward pointing normal vector w RD ∈ (CS5350/6350) Hyperplane based Classification September8,2011 2/20 Hyperplane Separates a D-dimensional space into two half-spaces Defined by an outward pointing normal vector w RD ∈ w is orthogonal to any vector lying on the hyperplane (CS5350/6350) Hyperplane based Classification September8,2011 2/20 Hyperplane Separates a D-dimensional space into two half-spaces Defined by an outward pointing normal vector w RD ∈ w is orthogonal to any vector lying on the hyperplane Assumption: The hyperplane passes through origin. (CS5350/6350) Hyperplane based Classification September8,2011 2/20 Hyperplane Separates a D-dimensional space into two half-spaces Defined by an outward pointing normal vector w RD ∈ w is orthogonal to any vector lying on the hyperplane Assumption: The hyperplane passes through origin. If not, have a bias term b; we will then need both w and b to define it b > 0 means moving it parallely along w (b < 0 means in opposite direction) (CS5350/6350) Hyperplane based Classification September8,2011 2/20 Linear Classification via Hyperplanes Linear Classifiers: Represent the decision boundary by a hyperplane w For binary classification, w is assumed to point towards the positive class (CS5350/6350) Hyperplane based Classification
    [Show full text]
  • Polynomials and Quadratics
    Higher hsn .uk.net Mathematics UNIT 2 OUTCOME 1 Polynomials and Quadratics Contents Polynomials and Quadratics 64 1 Quadratics 64 2 The Discriminant 66 3 Completing the Square 67 4 Sketching Parabolas 70 5 Determining the Equation of a Parabola 72 6 Solving Quadratic Inequalities 74 7 Intersections of Lines and Parabolas 76 8 Polynomials 77 9 Synthetic Division 78 10 Finding Unknown Coefficients 82 11 Finding Intersections of Curves 84 12 Determining the Equation of a Curve 86 13 Approximating Roots 88 HSN22100 This document was produced specially for the HSN.uk.net website, and we require that any copies or derivative works attribute the work to Higher Still Notes. For more details about the copyright on these notes, please see http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ Higher Mathematics Unit 2 – Polynomials and Quadratics OUTCOME 1 Polynomials and Quadratics 1 Quadratics A quadratic has the form ax2 + bx + c where a, b, and c are any real numbers, provided a ≠ 0 . You should already be familiar with the following. The graph of a quadratic is called a parabola . There are two possible shapes: concave up (if a > 0 ) concave down (if a < 0 ) This has a minimum This has a maximum turning point turning point To find the roots (i.e. solutions) of the quadratic equation ax2 + bx + c = 0, we can use: factorisation; completing the square (see Section 3); −b ± b2 − 4 ac the quadratic formula: x = (this is not given in the exam). 2a EXAMPLES 1. Find the roots of x2 −2 x − 3 = 0 .
    [Show full text]
  • Quadratic Polynomials
    Quadratic Polynomials If a>0thenthegraphofax2 is obtained by starting with the graph of x2, and then stretching or shrinking vertically by a. If a<0thenthegraphofax2 is obtained by starting with the graph of x2, then flipping it over the x-axis, and then stretching or shrinking vertically by the positive number a. When a>0wesaythatthegraphof− ax2 “opens up”. When a<0wesay that the graph of ax2 “opens down”. I Cit i-a x-ax~S ~12 *************‘s-aXiS —10.? 148 2 If a, c, d and a = 0, then the graph of a(x + c) 2 + d is obtained by If a, c, d R and a = 0, then the graph of a(x + c)2 + d is obtained by 2 R 6 2 shiftingIf a, c, the d ⇥ graphR and ofaax=⇤ 2 0,horizontally then the graph by c, and of a vertically(x + c) + byd dis. obtained (Remember by shiftingshifting the the⇥ graph graph of of axax⇤ 2 horizontallyhorizontally by by cc,, and and vertically vertically by by dd.. (Remember (Remember thatthatd>d>0meansmovingup,0meansmovingup,d<d<0meansmovingdown,0meansmovingdown,c>c>0meansmoving0meansmoving thatleft,andd>c<0meansmovingup,0meansmovingd<right0meansmovingdown,.) c>0meansmoving leftleft,and,andc<c<0meansmoving0meansmovingrightright.).) 2 If a =0,thegraphofafunctionf(x)=a(x + c) 2+ d is called a parabola. If a =0,thegraphofafunctionf(x)=a(x + c)2 + d is called a parabola. 6 2 TheIf a point=0,thegraphofafunction⇤ ( c, d) 2 is called thefvertex(x)=aof(x the+ c parabola.) + d is called a parabola. The point⇤ ( c, d) R2 is called the vertex of the parabola.
    [Show full text]
  • The Determinant and the Discriminant
    CHAPTER 2 The determinant and the discriminant In this chapter we discuss two indefinite quadratic forms: the determi- nant quadratic form det(a, b, c, d)=ad bc, − and the discriminant disc(a, b, c)=b2 4ac. − We will be interested in the integral representations of a given integer n by either of these, that is the set of solutions of the equations 4 ad bc = n, (a, b, c, d) Z − 2 and 2 3 b ac = n, (a, b, c) Z . − 2 For q either of these forms, we denote by Rq(n) the set of all such represen- tations. Consider the three basic questions of the previous chapter: (1) When is Rq(n) non-empty ? (2) If non-empty, how large Rq(n)is? (3) How is the set Rq(n) distributed as n varies ? In a suitable sense, a good portion of the answers to these question will be similar to the four and three square quadratic forms; but there will be major di↵erences coming from the fact that – det and disc are indefinite quadratic forms (have signature (2, 2) and (2, 1) over the reals), – det and disc admit isotropic vectors: there exist x Q4 (resp. Q3) such that det(x)=0(resp.disc(x) = 0). 2 1. Existence and number of representations by the determinant As the name suggest, determining Rdet(n) is equivalent to determining the integral 2 2 matrices of determinant n: ⇥ (n) ab Rdet(n) M (Z)= g = M2(Z), det(g)=n . ' 2 { cd 2 } ✓ ◆ n 0 Observe that the diagonal matrix a = has determinant n, and any 01 ✓ ◆ other matrix in the orbit SL2(Z).a is integral and has the same determinant.
    [Show full text]
  • Nature of the Discriminant
    Name: ___________________________ Date: ___________ Class Period: _____ Nature of the Discriminant Quadratic − b b 2 − 4ac x = b2 − 4ac Discriminant Formula 2a The discriminant predicts the “nature of the roots of a quadratic equation given that a, b, and c are rational numbers. It tells you the number of real roots/x-intercepts associated with a quadratic function. Value of the Example showing nature of roots of Graph indicating x-intercepts Discriminant b2 – 4ac ax2 + bx + c = 0 for y = ax2 + bx + c POSITIVE Not a perfect x2 – 2x – 7 = 0 2 b – 4ac > 0 square − (−2) (−2)2 − 4(1)(−7) x = 2(1) 2 32 2 4 2 x = = = 1 2 2 2 2 Discriminant: 32 There are two real roots. These roots are irrational. There are two x-intercepts. Perfect square x2 + 6x + 5 = 0 − 6 62 − 4(1)(5) x = 2(1) − 6 16 − 6 4 x = = = −1,−5 2 2 Discriminant: 16 There are two real roots. These roots are rational. There are two x-intercepts. ZERO b2 – 4ac = 0 x2 – 2x + 1 = 0 − (−2) (−2)2 − 4(1)(1) x = 2(1) 2 0 2 x = = = 1 2 2 Discriminant: 0 There is one real root (with a multiplicity of 2). This root is rational. There is one x-intercept. NEGATIVE b2 – 4ac < 0 x2 – 3x + 10 = 0 − (−3) (−3)2 − 4(1)(10) x = 2(1) 3 − 31 3 31 x = = i 2 2 2 Discriminant: -31 There are two complex/imaginary roots. There are no x-intercepts. Quadratic Formula and Discriminant Practice 1.
    [Show full text]
  • Resultant and Discriminant of Polynomials
    RESULTANT AND DISCRIMINANT OF POLYNOMIALS SVANTE JANSON Abstract. This is a collection of classical results about resultants and discriminants for polynomials, compiled mainly for my own use. All results are well-known 19th century mathematics, but I have not inves- tigated the history, and no references are given. 1. Resultant n m Definition 1.1. Let f(x) = anx + ··· + a0 and g(x) = bmx + ··· + b0 be two polynomials of degrees (at most) n and m, respectively, with coefficients in an arbitrary field F . Their resultant R(f; g) = Rn;m(f; g) is the element of F given by the determinant of the (m + n) × (m + n) Sylvester matrix Syl(f; g) = Syln;m(f; g) given by 0an an−1 an−2 ::: 0 0 0 1 B 0 an an−1 ::: 0 0 0 C B . C B . C B . C B C B 0 0 0 : : : a1 a0 0 C B C B 0 0 0 : : : a2 a1 a0C B C (1.1) Bbm bm−1 bm−2 ::: 0 0 0 C B C B 0 bm bm−1 ::: 0 0 0 C B . C B . C B C @ 0 0 0 : : : b1 b0 0 A 0 0 0 : : : b2 b1 b0 where the m first rows contain the coefficients an; an−1; : : : ; a0 of f shifted 0; 1; : : : ; m − 1 steps and padded with zeros, and the n last rows contain the coefficients bm; bm−1; : : : ; b0 of g shifted 0; 1; : : : ; n−1 steps and padded with zeros. In other words, the entry at (i; j) equals an+i−j if 1 ≤ i ≤ m and bi−j if m + 1 ≤ i ≤ m + n, with ai = 0 if i > n or i < 0 and bi = 0 if i > m or i < 0.
    [Show full text]
  • Lyashko-Looijenga Morphisms and Submaximal Factorisations of A
    LYASHKO-LOOIJENGA MORPHISMS AND SUBMAXIMAL FACTORIZATIONS OF A COXETER ELEMENT VIVIEN RIPOLL Abstract. When W is a finite reflection group, the noncrossing partition lattice NC(W ) of type W is a rich combinatorial object, extending the notion of noncrossing partitions of an n-gon. A formula (for which the only known proofs are case-by-case) expresses the number of multichains of a given length in NC(W ) as a generalized Fuß-Catalan number, depending on the invariant degrees of W . We describe how to understand some specifications of this formula in a case-free way, using an interpretation of the chains of NC(W ) as fibers of a Lyashko-Looijenga covering (LL), constructed from the geometry of the discriminant hypersurface of W . We study algebraically the map LL, describing the factorizations of its discriminant and its Jacobian. As byproducts, we generalize a formula stated by K. Saito for real reflection groups, and we deduce new enumeration formulas for certain factorizations of a Coxeter element of W . Introduction Complex reflection groups are a natural generalization of finite real reflection groups (that is, finite Coxeter groups realized in their geometric representation). In this article, we consider a well-generated complex reflection group W ; the precise definitions will be given in Sec. 1.1. The noncrossing partition lattice of type W , denoted NC(W ), is a particular subset of W , endowed with a partial order 4 called the absolute order (see definition below). When W is a Coxeter group of type A, NC(W ) is isomorphic to the poset of noncrossing partitions of a set, studied by Kreweras [Kre72].
    [Show full text]