Clustering by Left-Stochastic Matrix Factorization

Total Page:16

File Type:pdf, Size:1020Kb

Clustering by Left-Stochastic Matrix Factorization Clustering by Left-Stochastic Matrix Factorization Raman Arora [email protected] Maya R. Gupta [email protected] Amol Kapila [email protected] Maryam Fazel [email protected] University of Washington, Seattle, WA 98103, USA Abstract 1.1. Related Work in Matrix Factorization Some clustering objective functions can be written as We propose clustering samples given their matrix factorization objectives. Let n feature vectors d×n pairwise similarities by factorizing the sim- be gathered into a feature-vector matrix X 2 R . T d×k ilarity matrix into the product of a clus- Consider the model X ≈ FG , where F 2 R can ter probability matrix and its transpose. be interpreted as a matrix with k cluster prototypes n×k We propose a rotation-based algorithm to as its columns, and G 2 R is all zeros except for compute this left-stochastic decomposition one (appropriately scaled) positive entry per row that (LSD). Theoretical results link the LSD clus- indicates the nearest cluster prototype. The k-means tering method to a soft kernel k-means clus- clustering objective follows this model with squared tering, give conditions for when the factor- error, and can be expressed as (Ding et al., 2005): ization and clustering are unique, and pro- T 2 arg min kX − FG kF ; (1) vide error bounds. Experimental results on F;GT G=I simulated and real similarity datasets show G≥0 that the proposed method reliably provides accurate clusterings. where k · kF is the Frobenius norm, and inequality G ≥ 0 is component-wise. This follows because the combined constraints G ≥ 0 and GT G = I force each row of G to have only one positive element. It is straightforward to show that the minimizer of (1) oc- 1. Introduction curs at F ∗ = XG, so that (1) is equivalent to (Ding We propose a new non-negative matrix factorization et al., 2005; Li & Ding, 2006): (NNMF) model that yields a probabilistic clustering T T 2 arg min kX X − GG kF : (2) of samples given a matrix of pairwise similarities be- GT G=I tween the samples. The interpretation of non-negative G≥0 factors of matrices as describing different parts of data Replacing XT X in (2) with a kernel matrix K results was first given by (Paatero & Tapper, 1994) and (Lee in the same minimizer as the kernel k-means objective & Seung, 1999). We discuss more related work in clus- (Ding et al., 2005; Li & Ding, 2006): tering by matrix factorization in Section 1.1. Then we introduce the proposed left-stochastic decomposition T 2 arg min kK − GG kF : (3) (LSD) clustering formulation in Section 1.2. We pro- GT G=I vide a theoretical foundation for LSD in Section 2. We G≥0 exploit the geometric structure present in the problem to provide a rotation-based algorithm in Section 3. Ex- It has been shown that normalized-cut spectral cluster- perimental results are presented in Section 4. ing (Shi & Malik, 2000) also attempts to minimize an equivalent objective as kernel k-means, but for a kernel that is a normalized version of the input kernel (Ding Appearing in Proceedings of the 28 th International Con- et al., 2005). Similarly, probabilistic latent semantic ference on Machine Learning, Bellevue, WA, USA, 2011. indexing (PLSI) (Hofmann, 1999) can be formulated Copyright 2011 by the author(s)/owner(s). using the matrix factorization model X ≈ FGT , where LSD Clustering the approximation is in terms of relative entropy (Li 1.3. Related Kernel Definitions & Ding, 2006). Some kernel definitions are related to our idealization Ding et al. (2010) explored computationally simpler that a given similarity matrix is usefully modeled as variants of the kernel k-means objective by remov- cK ≈ P T P . Cristianini et al. (2001) defined the ideal T ∗ ing the problematic constraint in (3) that G G = I, kernel K to be Kij = 1 if and only if Xi and Xj are with the hope that the solutions would still have one from the same class. The Fisher kernel (Jaakkola & T −1 dominant entry per row of G, which they take as the Haussler, 1998) is defined as Kij = Ui I Uj, where indicator of cluster membership. One such variant, I is the Fisher information and each Ui is a function termed convex NMF, restricts the columns of F (the of the Fisher score, computed from a generative model cluster centroids) to be convex combinations of the and data Xi and Xj. columns of X. Another variant, cluster NMF, solves k − T k2 arg minG≥0 X XGG F . 2. Theoretical Results 1.2. LSD Clustering To provide intuition for the LSD clustering method and our rotation-based algorithmic approach, we 2 n×n We propose explicitly factorizing a matrix K R present some theoretical results. to produce probabilities that each of the n samples be- longs to each of k clusters. We require only that K be The proposed LSD clustering objective (4) is the same symmetric and that its top k eigenvalues be positive; as the objective (3), except that we constrain P to we refer to such matrices in this paper as similarity be left-stochastic, factorize a scaled version of K, and matrices. Note that a similarity matrix need not be do not constrain P T P = I. The LSD clustering can PSD { it can be an indefinite kernel matrix (Chen be interpreted as a soft kernel k-means clustering, as et al., 2009). If the input does not satisfy these re- follows: quirements, it must be modified before use with this method. For example, if the input is a graph adja- Proposition 1 (Soft Kernel K-means) Suppose T cency matrix A with zero diagonal, one could replace K is LSDable, and let Kij = φ(Xi) φ(Xj) for some d d the diagonal by each node's degree, or replace A by mapping φ : R ! R 1 and some feature-vector matrix d×n ∗ A + αI for some constant α and the identity matrix X = [X1;:::;Xn] 2 R , then the minimizer P of I, or perform some other spectral modification. See the LSD objective (4) also minimizes the following (Chen et al., 2009) for further discussion of such mod- soft kernel k-means objective: ifications. arg min kΦ(X) − FP k2 ; (5) d ×k F F 2R 1 Given a similarity matrix K, we estimate a scaling T P ≥0;P 1k=1n factor c∗, and a cluster probability matrix P ∗ that d1×n solves where Φ(X) = [φ(X1); φ(X2); : : : ; φ(Xn)] 2 R . T 2 arg min min kcK − P P kF ; (4) c2R+ P ≥0 Next, Theorem 1 states that there will be multiple T P 1k=1n left-stochastic factors P , which are related by rotations about the normal to the probability simplex (which in- where 1k is a k × 1 vector of all ones. Note that a solution P ∗ to the above optimization problem is an cludes permuting the rows, that is, changing the clus- approximate left-stochastic decomposition of the ma- ter labels): trix c∗K, which inspires the following terminology: Theorem 1 (Factors of K Related by Rotation) LSD Clustering: A solution P ∗ to (4) is a clus- Suppose K is LSDable such that K = P T P . Then, ter probability matrix, from which we form an LSD (a) for any matrix M 2 Rk×n s.t. K = M T M, there clustering by assigning the ith sample to cluster j∗ if is a unique orthogonal k × k matrix R s.t. M = RP . ∗ ^ k×n ^ ^T Pj∗i ≥ Pji for all j =6 j . (b) for any matrix P 2 R s.t. P ≥ 0; P 1k = 1n and K = P^T P^, there is a unique orthogonal k × k LSDable: We call K LSDable if the minimum of (4) matrix Ru s.t. P^ = RuP and Ru~u = ~u, where is zero. 1 T ~u = k [1;:::; 1] is normal to the plane containing the We show in Proposition 2 that the minimizing scaling probability simplex. factor c∗ is given in a closed form and does not de- (c) a sufficient condition for P to be unique (up to a pend on a particular solution P ∗ for an LSDable K. permutation of rows), is that there exists an m × m Therefore, without loss of generality we will assume sub-matrix in K which is a permutation of the identity ∗ k that c = 1 for an LSDable K. matrix Im, where m ≥ (b 2 c + 1). LSD Clustering T While there may be many LSDs of a given K, they may norm, kW kF ≤ . Then kK − P^ P^kF ≤ 2, where P^ all result in the same LSD clustering. The following minimizes (4) for K~ . Furthermore, if kW kF is o(λ~k), th theorem states that for k = 2, the LSD clustering is where λ~k is the k largest eigenvalue of K~ , then unique, and for k = 3, it gives tight conditions for p ! uniqueness of the LSD clustering. The key idea of this k 1 kP − RP^kF ≤ 1 + C1 kK 2 kF + ; (6) theorem for k = 3 is illustrated in Fig. 1. jλ~kj Theorem 2 (Uniqueness of an LSD Clustering) where R is an orthogonal matrix and C1 is a constant. For k = 2, the LSD clustering is unique (up to a permutation of the labels). For k = 3, let αc (αac) be The error bound in (6) involves three terms: the the maximum angle of clockwise (anti-clockwise) rota- first term captures the perturbation of the eigenval- tion about normal to the simplex that does not rotate ues and scales linearly with ; the second term in- 1 p any of the columns of P off the simplex; let βc (βac) volves kK 2 kF = tr(K) due to the coupling between be the minimum angle of clockwise (anti-clockwise) the eigenvectors of the original matrix K and the per- rotation that changes the clustering.
Recommended publications
  • Eigenvalues of Euclidean Distance Matrices and Rs-Majorization on R2
    Archive of SID 46th Annual Iranian Mathematics Conference 25-28 August 2015 Yazd University 2 Talk Eigenvalues of Euclidean distance matrices and rs-majorization on R pp.: 1{4 Eigenvalues of Euclidean Distance Matrices and rs-majorization on R2 Asma Ilkhanizadeh Manesh∗ Department of Pure Mathematics, Vali-e-Asr University of Rafsanjan Alemeh Sheikh Hoseini Department of Pure Mathematics, Shahid Bahonar University of Kerman Abstract Let D1 and D2 be two Euclidean distance matrices (EDMs) with correspond- ing positive semidefinite matrices B1 and B2 respectively. Suppose that λ(A) = ((λ(A)) )n is the vector of eigenvalues of a matrix A such that (λ(A)) ... i i=1 1 ≥ ≥ (λ(A))n. In this paper, the relation between the eigenvalues of EDMs and those of the 2 corresponding positive semidefinite matrices respect to rs, on R will be investigated. ≺ Keywords: Euclidean distance matrices, Rs-majorization. Mathematics Subject Classification [2010]: 34B15, 76A10 1 Introduction An n n nonnegative and symmetric matrix D = (d2 ) with zero diagonal elements is × ij called a predistance matrix. A predistance matrix D is called Euclidean or a Euclidean distance matrix (EDM) if there exist a positive integer r and a set of n points p1, . , pn r 2 2 { } such that p1, . , pn R and d = pi pj (i, j = 1, . , n), where . denotes the ∈ ij k − k k k usual Euclidean norm. The smallest value of r that satisfies the above condition is called the embedding dimension. As is well known, a predistance matrix D is Euclidean if and 1 1 t only if the matrix B = − P DP with P = I ee , where I is the n n identity matrix, 2 n − n n × and e is the vector of all ones, is positive semidefinite matrix.
    [Show full text]
  • Similarity-Based Clustering by Left-Stochastic Matrix Factorization
    JournalofMachineLearningResearch14(2013)1715-1746 Submitted 1/12; Revised 11/12; Published 7/13 Similarity-based Clustering by Left-Stochastic Matrix Factorization Raman Arora [email protected] Toyota Technological Institute 6045 S. Kenwood Ave Chicago, IL 60637, USA Maya R. Gupta [email protected] Google 1225 Charleston Rd Mountain View, CA 94301, USA Amol Kapila [email protected] Maryam Fazel [email protected] Department of Electrical Engineering University of Washington Seattle, WA 98195, USA Editor: Inderjit Dhillon Abstract For similarity-based clustering, we propose modeling the entries of a given similarity matrix as the inner products of the unknown cluster probabilities. To estimate the cluster probabilities from the given similarity matrix, we introduce a left-stochastic non-negative matrix factorization problem. A rotation-based algorithm is proposed for the matrix factorization. Conditions for unique matrix factorizations and clusterings are given, and an error bound is provided. The algorithm is partic- ularly efficient for the case of two clusters, which motivates a hierarchical variant for cases where the number of desired clusters is large. Experiments show that the proposed left-stochastic decom- position clustering model produces relatively high within-cluster similarity on most data sets and can match given class labels, and that the efficient hierarchical variant performs surprisingly well. Keywords: clustering, non-negative matrix factorization, rotation, indefinite kernel, similarity, completely positive 1. Introduction Clustering is important in a broad range of applications, from segmenting customers for more ef- fective advertising, to building codebooks for data compression. Many clustering methods can be interpreted in terms of a matrix factorization problem.
    [Show full text]
  • Alternating Sign Matrices and Polynomiography
    Alternating Sign Matrices and Polynomiography Bahman Kalantari Department of Computer Science Rutgers University, USA [email protected] Submitted: Apr 10, 2011; Accepted: Oct 15, 2011; Published: Oct 31, 2011 Mathematics Subject Classifications: 00A66, 15B35, 15B51, 30C15 Dedicated to Doron Zeilberger on the occasion of his sixtieth birthday Abstract To each permutation matrix we associate a complex permutation polynomial with roots at lattice points corresponding to the position of the ones. More generally, to an alternating sign matrix (ASM) we associate a complex alternating sign polynomial. On the one hand visualization of these polynomials through polynomiography, in a combinatorial fashion, provides for a rich source of algo- rithmic art-making, interdisciplinary teaching, and even leads to games. On the other hand, this combines a variety of concepts such as symmetry, counting and combinatorics, iteration functions and dynamical systems, giving rise to a source of research topics. More generally, we assign classes of polynomials to matrices in the Birkhoff and ASM polytopes. From the characterization of vertices of these polytopes, and by proving a symmetry-preserving property, we argue that polynomiography of ASMs form building blocks for approximate polynomiography for polynomials corresponding to any given member of these polytopes. To this end we offer an algorithm to express any member of the ASM polytope as a convex of combination of ASMs. In particular, we can give exact or approximate polynomiography for any Latin Square or Sudoku solution. We exhibit some images. Keywords: Alternating Sign Matrices, Polynomial Roots, Newton’s Method, Voronoi Diagram, Doubly Stochastic Matrices, Latin Squares, Linear Programming, Polynomiography 1 Introduction Polynomials are undoubtedly one of the most significant objects in all of mathematics and the sciences, particularly in combinatorics.
    [Show full text]
  • Representations of Stochastic Matrices
    Rotational (and Other) Representations of Stochastic Matrices Steve Alpern1 and V. S. Prasad2 1Department of Mathematics, London School of Economics, London WC2A 2AE, United Kingdom. email: [email protected] 2Department of Mathematics, University of Massachusetts Lowell, Lowell, MA. email: [email protected] May 27, 2005 Abstract Joel E. Cohen (1981) conjectured that any stochastic matrix P = pi;j could be represented by some circle rotation f in the following sense: Forf someg par- tition Si of the circle into sets consisting of …nite unions of arcs, we have (*) f g pi;j = (f (Si) Sj) = (Si), where denotes arc length. In this paper we show how cycle decomposition\ techniques originally used (Alpern, 1983) to establish Cohen’sconjecture can be extended to give a short simple proof of the Coding Theorem, that any mixing (that is, P N > 0 for some N) stochastic matrix P can be represented (in the sense of * but with Si merely measurable) by any aperiodic measure preserving bijection (automorphism) of a Lesbesgue proba- bility space. Representations by pointwise and setwise periodic automorphisms are also established. While this paper is largely expository, all the proofs, and some of the results, are new. Keywords: rotational representation, stochastic matrix, cycle decomposition MSC 2000 subject classi…cations. Primary: 60J10. Secondary: 15A51 1 Introduction An automorphism of a Lebesgue probability space (X; ; ) is a bimeasurable n bijection f : X X which preserves the measure : If S = Si is a non- ! f gi=1 trivial (all (Si) > 0) measurable partition of X; we can generate a stochastic n matrix P = pi;j by the de…nition f gi;j=1 (f (Si) Sj) pi;j = \ ; i; j = 1; : : : ; n: (1) (Si) Since the partition S is non-trivial, the matrix P has a positive invariant (stationary) distribution v = (v1; : : : ; vn) = ( (S1) ; : : : ; (Sn)) ; and hence (by de…nition) is recurrent.
    [Show full text]
  • Left Eigenvector of a Stochastic Matrix
    Advances in Pure Mathematics, 2011, 1, 105-117 doi:10.4236/apm.2011.14023 Published Online July 2011 (http://www.SciRP.org/journal/apm) Left Eigenvector of a Stochastic Matrix Sylvain Lavalle´e Departement de mathematiques, Universite du Quebec a Montreal, Montreal, Canada E-mail: [email protected] Received January 7, 2011; revised June 7, 2011; accepted June 15, 2011 Abstract We determine the left eigenvector of a stochastic matrix M associated to the eigenvalue 1 in the commu- tative and the noncommutative cases. In the commutative case, we see that the eigenvector associated to the eigenvalue 0 is (,,NN1 n ), where Ni is the ith principal minor of NMI= n , where In is the 11 identity matrix of dimension n . In the noncommutative case, this eigenvector is (,P1 ,Pn ), where Pi is the sum in aij of the corresponding labels of nonempty paths starting from i and not passing through i in the complete directed graph associated to M . Keywords: Generic Stochastic Noncommutative Matrix, Commutative Matrix, Left Eigenvector Associated To The Eigenvalue 1, Skew Field, Automata 1. Introduction stochastic free field and that the vector 11 (,,PP1 n ) is fixed by our matrix; moreover, the sum 1 It is well known that 1 is one of the eigenvalue of a of the Pi is equal to 1, hence they form a kind of stochastic matrix (i.e. the sum of the elements of each noncommutative limiting probability. row is equal to 1) and its associated right eigenvector is These results have been proved in [1] but the proof the vector (1,1, ,1)T .
    [Show full text]
  • Alternating Sign Matrices, Extensions and Related Cones
    See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/311671190 Alternating sign matrices, extensions and related cones Article in Advances in Applied Mathematics · May 2017 DOI: 10.1016/j.aam.2016.12.001 CITATIONS READS 0 29 2 authors: Richard A. Brualdi Geir Dahl University of Wisconsin–Madison University of Oslo 252 PUBLICATIONS 3,815 CITATIONS 102 PUBLICATIONS 1,032 CITATIONS SEE PROFILE SEE PROFILE Some of the authors of this publication are also working on these related projects: Combinatorial matrix theory; alternating sign matrices View project All content following this page was uploaded by Geir Dahl on 16 December 2016. The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document and are linked to publications on ResearchGate, letting you access and read them immediately. Alternating sign matrices, extensions and related cones Richard A. Brualdi∗ Geir Dahly December 1, 2016 Abstract An alternating sign matrix, or ASM, is a (0; ±1)-matrix where the nonzero entries in each row and column alternate in sign, and where each row and column sum is 1. We study the convex cone generated by ASMs of order n, called the ASM cone, as well as several related cones and polytopes. Some decomposition results are shown, and we find a minimal Hilbert basis of the ASM cone. The notion of (±1)-doubly stochastic matrices and a generalization of ASMs are introduced and various properties are shown. For instance, we give a new short proof of the linear characterization of the ASM polytope, in fact for a more general polytope.
    [Show full text]
  • Contents 5 Eigenvalues and Diagonalization
    Linear Algebra (part 5): Eigenvalues and Diagonalization (by Evan Dummit, 2017, v. 1.50) Contents 5 Eigenvalues and Diagonalization 1 5.1 Eigenvalues, Eigenvectors, and The Characteristic Polynomial . 1 5.1.1 Eigenvalues and Eigenvectors . 2 5.1.2 Eigenvalues and Eigenvectors of Matrices . 3 5.1.3 Eigenspaces . 6 5.2 Diagonalization . 9 5.3 Applications of Diagonalization . 14 5.3.1 Transition Matrices and Incidence Matrices . 14 5.3.2 Systems of Linear Dierential Equations . 16 5.3.3 Non-Diagonalizable Matrices and the Jordan Canonical Form . 19 5 Eigenvalues and Diagonalization In this chapter, we will discuss eigenvalues and eigenvectors: these are characteristic values (and characteristic vectors) associated to a linear operator T : V ! V that will allow us to study T in a particularly convenient way. Our ultimate goal is to describe methods for nding a basis for V such that the associated matrix for T has an especially simple form. We will rst describe diagonalization, the procedure for (trying to) nd a basis such that the associated matrix for T is a diagonal matrix, and characterize the linear operators that are diagonalizable. Then we will discuss a few applications of diagonalization, including the Cayley-Hamilton theorem that any matrix satises its characteristic polynomial, and close with a brief discussion of non-diagonalizable matrices. 5.1 Eigenvalues, Eigenvectors, and The Characteristic Polynomial • Suppose that we have a linear transformation T : V ! V from a (nite-dimensional) vector space V to itself. We would like to determine whether there exists a basis of such that the associated matrix β is a β V [T ]β diagonal matrix.
    [Show full text]
  • Markov Chains
    Stochastic Matrices The following 3 3matrixdefinesa × discrete time Markov process with three states: P11 P12 P13 P = P21 P22 P23 ⎡ ⎤ P31 P32 P33 ⎣ ⎦ where Pij is the probability of going from j i in one step. A stochastic matrix → satisfies the following conditions: P 0 ∀i, j ij ≥ and M j ∑ Pij = 1. ∀ i=1 Example The following 3 3matrixdefinesa × discrete time Markov process with three states: 0.90 0.01 0.09 P = 0.01 0.90 0.01 ⎡ ⎤ 0.09 0.09 0.90 ⎣ ⎦ where P23 = 0.01 is the probability of going from 3 2inonestep.Youcan → verify that P 0 ∀i, j ij ≥ and 3 j ∑ Pij = 1. ∀ i=1 Example (contd.) 0.9 0.9 0.01 1 2 0.01 0.09 0.09 0.01 0.09 3 0.9 Figure 1: Three-state Markov process. Single Step Transition Probabilities x(1) = Px(0) x(2) = Px(1) . x(t+1) = Px(t) 0.641 0.90 0.01 0.09 0.7 0.188 = 0.01 0.90 0.01 0.2 ⎡ ⎤ ⎡ ⎤⎡ ⎤ 0.171 0.09 0.09 0.90 0.1 ⎣ x(t+1) ⎦ ⎣ P ⎦⎣ x(t) ⎦ M % &' ( (t%+1) &' (t) (% &' ( xi = ∑ Pij x j j=1 n-step Transition Probabilities Observe that x(3) can be written as fol- lows: x(3) = Px(2) = P Px(1) = P)P Px*(0) = P3)x(0)). ** n-step Transition Probabilities (contd.) Similar logic leads us to an expression for x(n): x(n) = P P... Px(0) ) )n ** = Pnx(0). % &' ( An n-step transition probability matrix can be defined in terms of a single step matrix and a (n 1)-step matrix: − M ( n) = n 1 .
    [Show full text]
  • On Matrices with a Doubly Stochastic Pattern
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Elsevier - Publisher Connector JOURNAL OF MATHEMATICALANALYSISAND APPLICATIONS 34,648-652(1971) On Matrices with a Doubly Stochastic Pattern DAVID LONDON Technion-Israel Institute of Technology, Haifa Submitted by Ky Fan 1. INTRODUCTION In [7] Sinkhorn proved that if A is a positive square matrix, then there exist two diagonal matrices D, = {@r,..., d$) and D, = {dj2),..., di2r) with positive entries such that D,AD, is doubly stochastic. This problem was studied also by Marcus and Newman [3], Maxfield and Mint [4] and Menon [5]. Later Sinkhorn and Knopp [8] considered the same problem for A non- negative. Using a limit process of alternately normalizing the rows and columns sums of A, they obtained a necessary and sufficient condition for the existence of D, and D, such that D,AD, is doubly stochastic. Brualdi, Parter and Schneider [l] obtained the same theorem by a quite different method using spectral properties of some nonlinear operators. In this note we give a new proof of the same theorem. We introduce an extremal problem, and from the existence of a solution to this problem we derive the existence of D, and D, . This method yields also a variational characterization for JJTC1(din dj2’), which can be applied to obtain bounds for this quantity. We note that bounds for I7y-r (d,!‘) dj”)) may be of interest in connection with inequalities for the permanent of doubly stochastic matrices [3]. 2. PRELIMINARIES Let A = (aij) be an 12x n nonnegative matrix; that is, aij > 0 for i,j = 1,***, n.
    [Show full text]
  • Stochastic Adjacency Matrix M
    CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu Web pages are not equally “important” www.joe-schmoe.com vs. www.stanford.edu We already know: Since there is large diversity in the connectivity of the vs. webgraph we can rank the pages by the link structure 2/7/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 We will cover the following Link Analysis approaches to computing importances of nodes in a graph: . Page Rank . Hubs and Authorities (HITS) . Topic-Specific (Personalized) Page Rank . Web Spam Detection Algorithms 2/7/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 3 Idea: Links as votes . Page is more important if it has more links . In-coming links? Out-going links? Think of in-links as votes: . www.stanford.edu has 23,400 inlinks . www.joe-schmoe.com has 1 inlink Are all in-links are equal? . Links from important pages count more . Recursive question! 2/7/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 4 Each link’s vote is proportional to the importance of its source page If page p with importance x has n out-links, each link gets x/n votes Page p’s own importance is the sum of the votes on its in-links p 2/7/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 5 A “vote” from an important The web in 1839 page is worth more y/2 A page is important if it is y pointed to by other important a/2 pages y/2 Define a “rank” r for node j m j a m r a/2 i Flow equations: rj = ∑ ry = ry /2 + ra /2 i→ j dout (i) ra = ry /2 + rm rm = ra /2 2/7/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 6 Flow equations: 3 equations, 3 unknowns, ry = ry /2 + ra /2 no constants ra = ry /2 + rm .
    [Show full text]
  • An Inequality for Doubly Stochastic Matrices*
    JOURNAL OF RESEARCH of the National Bureau of Standards-B. Mathematical Sciences Vol. 80B, No.4, October-December 1976 An Inequality for Doubly Stochastic Matrices* Charles R. Johnson** and R. Bruce Kellogg** Institute for Basic Standards, National Bureau of Standards, Washington, D.C. 20234 (June 3D, 1976) Interrelated inequalities involving doubly stochastic matrices are presented. For example, if B is an n by n doubly stochasti c matrix, x any nonnegative vector and y = Bx, the n XIX,· •• ,x" :0:::; YIY" •• y ... Also, if A is an n by n nonnegotive matrix and D and E are positive diagonal matrices such that B = DAE is doubly stochasti c, the n det DE ;:::: p(A) ... , where p (A) is the Perron· Frobenius eigenvalue of A. The relationship between these two inequalities is exhibited. Key words: Diagonal scaling; doubly stochasti c matrix; P erron·Frobenius eigenvalue. n An n by n entry·wise nonnegative matrix B = (b i;) is called row (column) stochastic if l bi ; = 1 ;= 1 for all i = 1,. ',n (~l bij = 1 for all j = 1,' . ',n ). If B is simultaneously row and column stochastic then B is said to be doubly stochastic. We shall denote the Perron·Frobenius (maximal) eigenvalue of an arbitrary n by n entry·wise nonnegative matrix A by p (A). Of course, if A is stochastic, p (A) = 1. It is known precisely which n by n nonnegative matrices may be diagonally scaled by positive diagonal matrices D, E so that (1) B=DAE is doubly stochastic. If there is such a pair D, E, we shall say that A has property (").
    [Show full text]
  • Majorization Results for Zeros of Orthogonal Polynomials
    Majorization results for zeros of orthogonal polynomials ∗ Walter Van Assche KU Leuven September 4, 2018 Abstract We show that the zeros of consecutive orthogonal polynomials pn and pn−1 are linearly connected by a doubly stochastic matrix for which the entries are explicitly computed in terms of Christoffel numbers. We give similar results for the zeros of pn (1) and the associated polynomial pn−1 and for the zeros of the polynomial obtained by deleting the kth row and column (1 ≤ k ≤ n) in the corresponding Jacobi matrix. 1 Introduction 1.1 Majorization Recently S.M. Malamud [3, 4] and R. Pereira [6] have given a nice extension of the Gauss- Lucas theorem that the zeros of the derivative p′ of a polynomial p lie in the convex hull of the roots of p. They both used the theory of majorization of sequences to show that if x =(x1, x2,...,xn) are the zeros of a polynomial p of degree n, and y =(y1,...,yn−1) are the zeros of its derivative p′, then there exists a doubly stochastic (n − 1) × n matrix S such that y = Sx ([4, Thm. 4.7], [6, Thm. 5.4]). A rectangular (n − 1) × n matrix S =(si,j) is said to be doubly stochastic if si,j ≥ 0 and arXiv:1607.03251v1 [math.CA] 12 Jul 2016 n n− 1 n − 1 s =1, s = . i,j i,j n j=1 i=1 X X In this paper we will use the notion of majorization to rephrase some known results for the zeros of orthogonal polynomials on the real line.
    [Show full text]