Factorization Machines

Total Page:16

File Type:pdf, Size:1020Kb

Factorization Machines Factorization Machines Steffen Rendle Department of Reasoning for Intelligence The Institute of Scientific and Industrial Research Osaka University, Japan [email protected] Abstract—In this paper, we introduce Factorization Machines variable interactions (comparable to a polynomial kernel in (FM) which are a new model class that combines the advantages SVM), but uses a factorized parametrization instead of a of Support Vector Machines (SVM) with factorization models. dense parametrization like in SVMs. We show that the model Like SVMs, FMs are a general predictor working with any real valued feature vector. In contrast to SVMs, FMs model all equation of FMs can be computed in linear time and that it interactions between variables using factorized parameters. Thus depends only on a linear number of parameters. This allows they are able to estimate interactions even in problems with huge direct optimization and storage of model parameters without sparsity (like recommender systems) where SVMs fail. We show the need of storing any training data (e.g. support vectors) for that the model equation of FMs can be calculated in linear time prediction. In contrast to this, non-linear SVMs are usually and thus FMs can be optimized directly. So unlike nonlinear SVMs, a transformation in the dual form is not necessary and optimized in the dual form and computing a prediction (the the model parameters can be estimated directly without the need model equation) depends on parts of the training data (the of any support vector in the solution. We show the relationship support vectors). We also show that FMs subsume many of to SVMs and the advantages of FMs for parameter estimation the most successful approaches for the task of collaborative in sparse settings. filtering including biased MF, SVD++ [2], PITF [3] and FPMC On the other hand there are many different factorization mod- els like matrix factorization, parallel factor analysis or specialized [4]. models like SVD++, PITF or FPMC. The drawback of these In total, the advantages of our proposed FM are: models is that they are not applicable for general prediction tasks 1) FMs allow parameter estimation under very sparse data but work only with special input data. Furthermore their model where SVMs fail. equations and optimization algorithms are derived individually for each task. We show that FMs can mimic these models just 2) FMs have linear complexity, can be optimized in the by specifying the input data (i.e. the feature vectors). This makes primal and do not rely on support vectors like SVMs. FMs easily applicable even for users without expert knowledge We show that FMs scale to large datasets like Netflix in factorization models. with 100 millions of training instances. Index Terms —factorization machine; sparse data; tensor fac- 3) FMs are a general predictor that can work with any real torization; support vector machine valued feature vector. In contrast to this, other state-of- the-art factorization models work only on very restricted I. INTRODUCTION input data. We will show that just by defining the feature Support Vector Machines are one of the most popular vectors of the input data, FMs can mimic state-of-the-art predictors in machine learning and data mining. Nevertheless models like biased MF, SVD++, PITF or FPMC. in settings like collaborative filtering, SVMs play no important role and the best models are either direct applications of II. PREDICTION UNDER SPARSITY standard matrix/ tensor factorization models like PARAFAC The most common prediction task is to estimate a function [1] or specialized models using factorized parameters [2], [3], y : Rn T from a real valued feature vector x Rn to a [4]. In this paper, we show that the only reason why standard target domain→ T (e.g. T = R for regression or T ∈= +, SVM predictors are not successful in these tasks is that they for classification). In supervised settings, it is assumed{ t−}hat cannot learn reliable parameters (‘hyperplanes’) in complex there is a training dataset D = (x(1),y(1)), (x(2),y(2)),... (non-linear) kernel spaces under very sparse data. On the other of examples for the target function{ y given. We also investigate} hand, the drawback of tensor factorization models and even the ranking task where the function y with target T = R can more for specialized factorization models is that (1) they are be used to score feature vectors x and sort them according to not applicable to standard prediction data (e.g. a real valued their score. Scoring functions can be learned with pairwise feature vector in Rn.) and (2) that specialized models are training data [5], where a feature tuple (x(A), x(B)) D usually derived individually for a specific task requiring effort means that x(A) should be ranked higher than x(B). As∈ the in modelling and design of a learning algorithm. pairwise ranking relation is antisymmetric, it is sufficient to In this paper, we introduce a new predictor, the Factor- use only positive training instances. ization Machine (FM), that is a general predictor like SVMs In this paper, we deal with problems where x is highly but is also able to estimate reliable parameters under very sparse, i.e. almost all of the elements xi of a vector x are high sparsity. The factorization machine models all nested zero. Let m(x) be the number of non-zero elements in the has ever rated. For each user, the variables are normalized such that they sum up to 1. E.g. Alice has rated Titanic, Notting Hill and Star Wars. Additionally the example contains a variable (green) holding the time in months starting from January, 2009. And finally the vector contains information of the last movie (brown) the user has rated before (s)he rated the active one – e.g. for x(2), Alice rated Titanic before she rated Notting Hill. In section V, we show how factorization machines using such feature vectors as input data are related to specialized state-of-the-art factorization models. Fig. 1. Example for sparse real valued feature vectors x that are created from the transactions of example 1. Every row represents a feature vector x(i) with We will use this example data throughout the paper for illus- (i) its corresponding target y . The first 4 columns (blue) represent indicator tration. However please note that FMs are general predictors variables for the active user; the next 5 (red) indicator variables for the active item. The next 5 columns (yellow) hold additional implicit indicators (i.e. like SVMs and thus are applicable to any real valued feature other movies the user has rated). One feature (green) represents the time in vectors and are not restricted to recommender systems. months. The last 5 columns (brown) have indicators for the last movie the user has rated before the active one. The rightmost column is the target – III. FACTORIZATION MACHINES (FM) here the rating. In this section, we introduce factorization machines. We discuss the model equation in detail and show shortly how to apply FMs to several prediction tasks. feature vector x and mD be the average number of non-zero elements m(x) of all vectors x D. Huge sparsity (mD A. Factorization Machine Model ∈ ≪ n) appears in many real-world data like feature vectors of 1) Model Equation: The model equation for a factorization event transactions (e.g. purchases in recommender systems) machine of degree d =2 is defined as: or text analysis (e.g. bag of word approach). One reason for n n n huge sparsity is that the underlying problem deals with large yˆ(x) := w + wi xi + vi, vj xi xj (1) 0 categorical variable domains. i=1 i=1 j=i+1 Example 1 Assume we have the transaction data of a movie where the model parameters that have to be estimated are: n n×k review system. The system records which user u U rates w0 R, w R , V R (2) a movie (item) i I at a certain time t R with∈ a rating ∈ ∈ ∈ ∈ ∈ And , is the dot product of two vectors of size k: r 1, 2, 3, 4, 5 . Let the users U and items I be: · · ∈{ } k U = Alice (A), Bob (B), Charlie (C),... vi, vj := vi,f vj,f (3) { } · f I = Titanic (TI), Notting Hill (NH), Star Wars (SW), =1 { v V Star Trek (ST),... A row i within describes the i-th variable with k factors. N+ } k 0 is a hyperparameter that defines the dimensionality Let the observed data S be: of∈ the factorization. A 2-way FM (degree d =2) captures all single and pairwise S = (A, TI, 2010-1, 5), (A, NH, 2010-2, 3), (A, SW, 2010-4, 1), { interactions between variables: (B, SW, 2009-5, 4), (B, ST, 2009-8, 5), • w0 is the global bias. (C, TI, 2009-9, 1), (C, SW, 2009-12, 5) • wi models the strength of the i-th variable. } • wˆi,j := vi, vj models the interaction between the i- An example for a prediction task using this data, is to estimate th and j-th variable. Instead of using an own model a function yˆ that predicts the rating behaviour of a user for an parameter wi,j R for each interaction, the FM models item at a certain point in time. the interaction by∈ factorizing it. We will see later on, that Figure 1 shows one example of how feature vectors can this is the key point which allows high quality parameter be created from S for this task.1 Here, first there are U | | estimates of higher-order interactions (d 2) under binary indicator variables (blue) that represent the active user sparsity.
Recommended publications
  • Algorithmic Factorization of Polynomials Over Number Fields
    Rose-Hulman Institute of Technology Rose-Hulman Scholar Mathematical Sciences Technical Reports (MSTR) Mathematics 5-18-2017 Algorithmic Factorization of Polynomials over Number Fields Christian Schulz Rose-Hulman Institute of Technology Follow this and additional works at: https://scholar.rose-hulman.edu/math_mstr Part of the Number Theory Commons, and the Theory and Algorithms Commons Recommended Citation Schulz, Christian, "Algorithmic Factorization of Polynomials over Number Fields" (2017). Mathematical Sciences Technical Reports (MSTR). 163. https://scholar.rose-hulman.edu/math_mstr/163 This Dissertation is brought to you for free and open access by the Mathematics at Rose-Hulman Scholar. It has been accepted for inclusion in Mathematical Sciences Technical Reports (MSTR) by an authorized administrator of Rose-Hulman Scholar. For more information, please contact [email protected]. Algorithmic Factorization of Polynomials over Number Fields Christian Schulz May 18, 2017 Abstract The problem of exact polynomial factorization, in other words expressing a poly- nomial as a product of irreducible polynomials over some field, has applications in algebraic number theory. Although some algorithms for factorization over algebraic number fields are known, few are taught such general algorithms, as their use is mainly as part of the code of various computer algebra systems. This thesis provides a summary of one such algorithm, which the author has also fully implemented at https://github.com/Whirligig231/number-field-factorization, along with an analysis of the runtime of this algorithm. Let k be the product of the degrees of the adjoined elements used to form the algebraic number field in question, let s be the sum of the squares of these degrees, and let d be the degree of the polynomial to be factored; then the runtime of this algorithm is found to be O(d4sk2 + 2dd3).
    [Show full text]
  • Subclass Discriminant Nonnegative Matrix Factorization for Facial Image Analysis
    Pattern Recognition 45 (2012) 4080–4091 Contents lists available at SciVerse ScienceDirect Pattern Recognition journal homepage: www.elsevier.com/locate/pr Subclass discriminant Nonnegative Matrix Factorization for facial image analysis Symeon Nikitidis b,a, Anastasios Tefas b, Nikos Nikolaidis b,a, Ioannis Pitas b,a,n a Informatics and Telematics Institute, Center for Research and Technology, Hellas, Greece b Department of Informatics, Aristotle University of Thessaloniki, Greece article info abstract Article history: Nonnegative Matrix Factorization (NMF) is among the most popular subspace methods, widely used in Received 4 October 2011 a variety of image processing problems. Recently, a discriminant NMF method that incorporates Linear Received in revised form Discriminant Analysis inspired criteria has been proposed, which achieves an efficient decomposition of 21 March 2012 the provided data to its discriminant parts, thus enhancing classification performance. However, this Accepted 26 April 2012 approach possesses certain limitations, since it assumes that the underlying data distribution is Available online 16 May 2012 unimodal, which is often unrealistic. To remedy this limitation, we regard that data inside each class Keywords: have a multimodal distribution, thus forming clusters and use criteria inspired by Clustering based Nonnegative Matrix Factorization Discriminant Analysis. The proposed method incorporates appropriate discriminant constraints in the Subclass discriminant analysis NMF decomposition cost function in order to address the problem of finding discriminant projections Multiplicative updates that enhance class separability in the reduced dimensional projection space, while taking into account Facial expression recognition Face recognition subclass information. The developed algorithm has been applied for both facial expression and face recognition on three popular databases.
    [Show full text]
  • Finite Fields: Further Properties
    Chapter 4 Finite fields: further properties 8 Roots of unity in finite fields In this section, we will generalize the concept of roots of unity (well-known for complex numbers) to the finite field setting, by considering the splitting field of the polynomial xn − 1. This has links with irreducible polynomials, and provides an effective way of obtaining primitive elements and hence representing finite fields. Definition 8.1 Let n ∈ N. The splitting field of xn − 1 over a field K is called the nth cyclotomic field over K and denoted by K(n). The roots of xn − 1 in K(n) are called the nth roots of unity over K and the set of all these roots is denoted by E(n). The following result, concerning the properties of E(n), holds for an arbitrary (not just a finite!) field K. Theorem 8.2 Let n ∈ N and K a field of characteristic p (where p may take the value 0 in this theorem). Then (i) If p ∤ n, then E(n) is a cyclic group of order n with respect to multiplication in K(n). (ii) If p | n, write n = mpe with positive integers m and e and p ∤ m. Then K(n) = K(m), E(n) = E(m) and the roots of xn − 1 are the m elements of E(m), each occurring with multiplicity pe. Proof. (i) The n = 1 case is trivial. For n ≥ 2, observe that xn − 1 and its derivative nxn−1 have no common roots; thus xn −1 cannot have multiple roots and hence E(n) has n elements.
    [Show full text]
  • Prime Factorization
    Prime Factorization Prime Number ­ A number with only two factors: ____ and itself Circle the prime numbers listed below 25 30 2 5 1 9 14 61 Composite Number ­ A number that has more than 2 factors List five examples of composite numbers What kind of number is 0? What kind of number is 1? Every human has a unique fingerprint. Similarly, every COMPOSITE number has a unique "factorprint" called __________________________ Prime Factorization ­ the factorization of a composite number into ____________ factors You can use a _________________ to find the prime factorization of any composite number. Ask yourself, "what Factorization two whole numbers 24 could I multiply together to equal the given number?" If the number is prime, do not put 1 x the number. Once you have all prime numbers, you are finished. Write your answer in exponential form. 24 Expanded Form (written as a multiplication of prime numbers) _______________________ Exponential Form (written with exponents) ________________________ Prime Factorization Ask yourself, "what two 36 numbers could I multiply together to equal the given number?" If the number is prime, do not put 1 x the number. Once you have all prime numbers, you are finished. Write your answer in both expanded and exponential forms. Prime Factorization Ask yourself, "what two 68 numbers could I multiply together to equal the given number?" If the number is prime, do not put 1 x the number. Once you have all prime numbers, you are finished. Write your answer in both expanded and exponential forms. Prime Factorization Ask yourself, "what two 120 numbers could I multiply together to equal the given number?" If the number is prime, do not put 1 x the number.
    [Show full text]
  • Performance and Difficulties of Students in Formulating and Solving Quadratic Equations with One Unknown* Makbule Gozde Didisa Gaziosmanpasa University
    ISSN 1303-0485 • eISSN 2148-7561 DOI 10.12738/estp.2015.4.2743 Received | December 1, 2014 Copyright © 2015 EDAM • http://www.estp.com.tr Accepted | April 17, 2015 Educational Sciences: Theory & Practice • 2015 August • 15(4) • 1137-1150 OnlineFirst | August 24, 2015 Performance and Difficulties of Students in Formulating and Solving Quadratic Equations with One Unknown* Makbule Gozde Didisa Gaziosmanpasa University Ayhan Kursat Erbasb Middle East Technical University Abstract This study attempts to investigate the performance of tenth-grade students in solving quadratic equations with one unknown, using symbolic equation and word-problem representations. The participants were 217 tenth-grade students, from three different public high schools. Data was collected through an open-ended questionnaire comprising eight symbolic equations and four word problems; furthermore, semi-structured interviews were conducted with sixteen of the students. In the data analysis, the percentage of the students’ correct, incorrect, blank, and incomplete responses was determined to obtain an overview of student performance in solving symbolic equations and word problems. In addition, the students’ written responses and interview data were qualitatively analyzed to determine the nature of the students’ difficulties in formulating and solving quadratic equations. The findings revealed that although students have difficulties in solving both symbolic quadratic equations and quadratic word problems, they performed better in the context of symbolic equations compared with word problems. Student difficulties in solving symbolic problems were mainly associated with arithmetic and algebraic manipulation errors. In the word problems, however, students had difficulties comprehending the context and were therefore unable to formulate the equation to be solved.
    [Show full text]
  • Algebraic Number Theory Summary of Notes
    Algebraic Number Theory summary of notes Robin Chapman 3 May 2000, revised 28 March 2004, corrected 4 January 2005 This is a summary of the 1999–2000 course on algebraic number the- ory. Proofs will generally be sketched rather than presented in detail. Also, examples will be very thin on the ground. I first learnt algebraic number theory from Stewart and Tall’s textbook Algebraic Number Theory (Chapman & Hall, 1979) (latest edition retitled Algebraic Number Theory and Fermat’s Last Theorem (A. K. Peters, 2002)) and these notes owe much to this book. I am indebted to Artur Costa Steiner for pointing out an error in an earlier version. 1 Algebraic numbers and integers We say that α ∈ C is an algebraic number if f(α) = 0 for some monic polynomial f ∈ Q[X]. We say that β ∈ C is an algebraic integer if g(α) = 0 for some monic polynomial g ∈ Z[X]. We let A and B denote the sets of algebraic numbers and algebraic integers respectively. Clearly B ⊆ A, Z ⊆ B and Q ⊆ A. Lemma 1.1 Let α ∈ A. Then there is β ∈ B and a nonzero m ∈ Z with α = β/m. Proof There is a monic polynomial f ∈ Q[X] with f(α) = 0. Let m be the product of the denominators of the coefficients of f. Then g = mf ∈ Z[X]. Pn j Write g = j=0 ajX . Then an = m. Now n n−1 X n−1+j j h(X) = m g(X/m) = m ajX j=0 1 is monic with integer coefficients (the only slightly problematical coefficient n −1 n−1 is that of X which equals m Am = 1).
    [Show full text]
  • DISCRIMINANTS in TOWERS Let a Be a Dedekind Domain with Fraction
    DISCRIMINANTS IN TOWERS JOSEPH RABINOFF Let A be a Dedekind domain with fraction field F, let K=F be a finite separable ex- tension field, and let B be the integral closure of A in K. In this note, we will define the discriminant ideal B=A and the relative ideal norm NB=A(b). The goal is to prove the formula D [L:K] C=A = NB=A C=B B=A , D D ·D where C is the integral closure of B in a finite separable extension field L=K. See Theo- rem 6.1. The main tool we will use is localizations, and in some sense the main purpose of this note is to demonstrate the utility of localizations in algebraic number theory via the discriminants in towers formula. Our treatment is self-contained in that it only uses results from Samuel’s Algebraic Theory of Numbers, cited as [Samuel]. Remark. All finite extensions of a perfect field are separable, so one can replace “Let K=F be a separable extension” by “suppose F is perfect” here and throughout. Note that Samuel generally assumes the base has characteristic zero when it suffices to assume that an extension is separable. We will use the more general fact, while quoting [Samuel] for the proof. 1. Notation and review. Here we fix some notations and recall some facts proved in [Samuel]. Let K=F be a finite field extension of degree n, and let x1,..., xn K. We define 2 n D x1,..., xn det TrK=F xi x j .
    [Show full text]
  • 1 Factorization of Polynomials
    AM 106/206: Applied Algebra Prof. Salil Vadhan Lecture Notes 18 November 11, 2010 1 Factorization of Polynomials • Reading: Gallian Ch. 16 • Throughout F is a field, and we consider polynomials in F [x]. • Def: For f(x); g(x) 2 F [x], not both zero, the greatest common divisor of f(x) and g(x) is the monic polynomial h(x) of largest degree such that h(x) divides both f(x) and g(x). • Euclidean Algorithm for Polynomials: Given two polynomials f(x) and g(x) of degree at most n, not both zero, their greatest common divisor h(x), can be computed using at most n + 1 divisions of polynomials of degree at most n. Moreover, using O(n) operations on polynomials of degree at most n, we can also find polynomials s(x) and t(x) such that h(x) = s(x)f(x) + t(x)g(x). Proof: analogous to integers, using repeated division. Euclid(f; g): 1. Assume WLOG deg(f) ≥ deg(g) > 0. 2. Set i = 1, f1 = f, f2 = g. 3. Repeat until fi+1 = 0: (a) Compute fi+2 = fi mod fi+1 (i.e. fi+2 is the remainder when fi is divided by fi+1). (b) Increment i. 4. Output fi divided by its leading coefficient (to make it monic). Here the complexity analysis is simpler than for integers: note that the degree of fi+2 is strictly smaller than that of fi, so fn+2 is of degree zero, and fn+3 = 0. Thus we do at most n divisions.
    [Show full text]
  • Notes on Factors, Prime Numbers, and Prime Factorization
    Notes on Factors, Prime Numbers, and Prime Factorization Factors are the numbers that multiply together to get another number. A Product is the number produced by multiplying two factors. All numbers have 1 and itself as factors. A number whose only factors are 1 and itself is a prime number. Prime numbers have exactly two factors. The smallest 168 prime numbers (all the prime numbers under 1000) are: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997 There are infinitely many prime numbers. Another way of saying this is that the sequence of prime numbers never ends. The number 1 is not considered a prime.
    [Show full text]
  • An Introduction to Factorization Algebras
    AN INTRODUCTION TO FACTORIZATION ALGEBRAS NOTES BY RYAN MICKLER AND BRIAN WILLIAMS Our primary reference are the books of Costello and Gwillam [CG17, CG]. 1. Prefactorization algebras 1.1. Definitions. A prefactorization algebra F on a topological space M, with values in V ect⊗ (the symmetric monodical category of vector spaces), is an assignment of a vector space F(U) for each open set U ⊆ M together with the following data: V • For an inclusion U ! V , a map µU : F(U) !F(V ). • For a finite collection of disjoint opens ti2I Ui ⊂ V , an ΣjIj- equivariant map µV : ⊗ F(U ) !F(V ) fUig i2I i The maps µV are subject to following compatibility for sequences fUig of inclusions: • For a collection of disjoint opens Vj ⊂ W , and collections of disjoint opens Uj;i ⊂ Vj, the composite maps ⊗j ⊗i F(Uj;i) / ⊗jF(Vj) & y F(W ) agree. For example, the composition maps for the following configuration of three open sets are given by: 1 Note that F(;) must be a commutative algebra, and the map ;! U for any open U, turns F(U) into a pointed vector space. A prefactorization algebra is called multiplicative if ∼ ⊗iF(Ui) = F(U1 q · · · q Un) via the natural map µtUi . fUig 1.1.1. An equivalent definition. One can reformulate the definition of a factorization algebra in the following way. It uses the following defi- nition. A pseudo-tensor category is a collection of objects M together with ΣjIj-equivariant vector spaces M(fAigi2I jB) for each finite open set I and objects fAi;Bg in M satisfying certain associativity, equivariance, and unital axioms.
    [Show full text]
  • Implementing and Comparing Integer Factorization Algorithms
    Implementing and Comparing Integer Factorization Algorithms Jacqueline Speiser jspeiser p Abstract by choosing B = exp( logN loglogN)) and let the factor base be the set of all primes smaller than B. Next, Integer factorization is an important problem in modern search for positive integers x such that x2 mod N is B- cryptography as it is the basis of RSA encryption. I have smooth, meaning that all the factors of x2 are in the factor implemented two integer factorization algorithms: Pol- 2 e1 e2 ek base. For all B-smooth numbers xi = p p ::: p , lard’s rho algorithm and Dixon’s factorization method. 2 record (xi ;~ei). After we have enough of these relations, While the results are not revolutionary, they illustrate we can solve a system of linear equations to find some the software design difficulties inherent to integer fac- subset of the relations such that ∑~ei =~0 mod 2. (See the torization. The code for this project is available at Implementation section for details on how this is done.) https://github.com/jspeiser/factoring. Note that if k is the size of our factor base, then we only need k + 1 relations to guarantee that such a solution 1 Introduction exists. We have now found a congruence of squares, 2 2 2 ∑i ei1 ∑i eik a = xi and b = p1 ::: pk . This implies that The integer factorization problem is defined as follows: (a + b)(a − b) = 0 mod N, which means that there is a given a composite number N, find two integers x and y 50% chance that gcd(a−b;N) factorspN.
    [Show full text]
  • Factorization of Polynomials Over Finite Fields
    Pacific Journal of Mathematics FACTORIZATION OF POLYNOMIALS OVER FINITE FIELDS RICHARD GORDON SWAN Vol. 12, No. 3 March 1962 FACTORIZATION OF POLYNOMIALS OVER FINITE FIELDS RICHARD G. SWAN Dickson [1, Ch. V, Th. 38] has given an interesting necessary con- dition for a polynomial over a finite field of odd characteristic to be irreducible. In Theorem 1 below, I will give a generalization of this result which can also be applied to fields of characteristic 2. It also applies to reducible polynomials and gives the number of irreducible factors mod 2. Applying the theorem to the polynomial xp — 1 gives a simple proof of the quadratic reciprocity theorem. Since there is some interest in trinomial equations over finite fields, e.g. [2], [4], I will also apply the theorem to trinomials and so determine the parity of the number of irreducible factors. 1. The discriminant* If f{x) is a polynomial over a field F, the discriminant of f(x) is defined to be D(f) = δ(/)2 with where alf , an are the roots of f(x) (counted with multiplicity) in some extension field of F. Clearly D(/) = 0 if / has any repeated τoot. Since D(f) is a symmetric function in the roots of /, D(f) e F. An alternative formula for D(f) which is sometimes useful may be obtained as follows: D(f) = Π(«i - <*sY = (-1)-'—1>/aΠ(«i - «;) = (-ir'-^'Π/'ίαO where n is the degree of f(x) and f\x) the derivative of f(x). In § 4, I will give still another way to calculate D(f).
    [Show full text]