Clustering by Low-Rank Doubly Stochastic Matrix Decomposition

Total Page:16

File Type:pdf, Size:1020Kb

Clustering by Low-Rank Doubly Stochastic Matrix Decomposition Clustering by Low-Rank Doubly Stochastic Matrix Decomposition Zhirong Yang [email protected] Department of Information and Computer Science, Aalto University, 00076, Finland Erkki Oja [email protected] Department of Information and Computer Science, Aalto University, 00076, Finland Abstract negativity constraint, together with various low-rank matrix approximation objectives, has widely been used Clustering analysis by nonnegative low- for the relaxation purpose in the past decade. rank approximations has achieved remark- able progress in the past decade. However, The most popular nonnegative low-rank approxi- most approximation approaches in this di- mation method is Nonnegative Matrix Factorization rection are still restricted to matrix factor- (NMF). It finds a matrix that approximates the simi- ization. We propose a new low-rank learn- larities and can be factorized into several nonnegative ing method to improve the clustering per- low-rank matrices. NMF was originally applied to vec- formance, which is beyond matrix factoriza- torial data, where Ding et al.(2010) have shown that tion. The approximation is based on a two- NMF is equivalent to the classical k-means method. step bipartite random walk through virtual Later NMF was applied to the (weighted) graph given cluster nodes, where the approximation is by the pairwise similarities. For example, Ding et al. formed by only cluster assigning probabil- (2008) presented Nonnegative Spectral Cuts by using a ities. Minimizing the approximation error multiplicative algorithm; Arora et al.(2011) proposed measured by Kullback-Leibler divergence is Left Stochastic Decomposition that approximates a equivalent to maximizing the likelihood of similarity matrix based on Euclidean distance and a a discriminative model, which endows our left-stochastic matrix. Another stream in the same method with a solid probabilistic interpre- direction is topic modeling. Hofmann(1999) gave a tation. The optimization is implemented generative model in Probabilistic Latent Semantic In- by a relaxed Majorization-Minimization algo- dexing (PLSI) for counting data, which is essentially rithm that is advantageous in finding good lo- equivalent to NMF using Kullback-Leibler (KL) di- cal minima. Furthermore, we point out that vergence and Tri-factorizations. Bayesian treatment the regularized algorithm with Dirichlet prior of PLSI by using Dirichlet prior was later introduced only serves as initialization. Experimental re- by Blei et al.(2001). Symmetric PLSI with the same sults show that the new method has strong Bayesian treatment is called Interaction Component performance in clustering purity for various Model (ICM) (Sinkkonen et al., 2008). datasets, especially for large-scale manifold Despite remarkable progress, the above relaxation ap- data. proaches are still not fully satisfactory in all of the fol- lowing requirements that affect the clustering perfor- mance using nonnegative low-rank approximation: (1) 1. Introduction approximation error measure that takes into account Cluster analysis assigns a set of objects into groups so sparse similarities, (2) decomposition form of the ap- that the objects in the same cluster are more similar proximating matrix, where the decomposing matrices to each other than to those in other clusters. Opti- should contain just enough parameters for clustering mization of most clustering objectives is NP-hard and but not more, and (3) normalization of the approxi- relaxation to \soft" clustering is often required. A non- mating matrix, which ensures relatively balanced clus- ters and equal contribution of each data sample. Lack- Appearing in Proceedings of the 29 th International Confer- ing one or more of these dimensions can severely affect ence on Machine Learning, Edinburgh, Scotland, UK, 2012. clustering performance. Copyright 2012 by the author(s)/owner(s). Clustering by Low-Rank Doubly Stochastic Matrix Decomposition In this paper we present a new nonnegative low-rank where i = 1; : : : ; n and k = 1; : : : ; r. In the following, approximation method for clustering, which satisfies i; j and v stand for data sample (node) indices while all of the above three requirements. First, because k and l stand for cluster indices. datasets often lie in curved manifolds such that only similarities in a small neighborhood are reliable, we 2.1. Learning objective adopt KL-divergence to handle the resulting sparsity. Second, different from PLSI, we enforce an equal con- Some of our work was inspired by the AnchorGraph tribution of every data sample and then directly con- (Liu et al., 2010) which was used in large approxi- struct the decomposition over the probabilities from mative graph construction based on a two-step ran- samples to clusters. Third, these probabilities form dom walk between data nodes through a set of an- the only decomposing matrix to be learned in our ap- chor nodes. Note that AnchorGraph is not a clustering proach and directly gives the answer for probabilistic method. clustering. Furthermore, our decomposition method If we augment the input similarity graph by r cluster leads to a doubly-stochastic approximating matrix, nodes, the cluster assigning probabilities can be seen as which was shown to be desired for balanced graph cuts single-step random walk probabilities from data nodes (Zass & Shashua, 2006). We name our new method to the augmented cluster nodes. Without preference DCD because it is based on Data-Cluster-Data ran- to any particular samples, we impose uniform prior dom walks. P (i) = 1=n over the data nodes. By this prior, the In order to solve the DCD learning objective, we pro- reversed random walk probabilities can then be calcu- pose a novel relaxed Majorization-Minimization algo- lated by the Bayes formula rithm to handle the new matrix decomposition type. P (kji)P (i) P (kji) Our relaxation strategy works robustly in finding sast- P (ijk) = = : (1) P P (kjv)P (v) P P (kjv) isfactory local optimizers under the stochasticity con- v v straint. Furthermore, we argue that complexity con- trol such as Bayesian priors only provides initialization Consider next the probability of two-step random for the new algorithm. This eliminates the problem of walks from ith data node to jth data node via all clus- hyperparameter selection in the prior. ter nodes (DCD random walk): X X P (kji)P (kjj) Empirical comparison with NMF and other graph- P (ijj) = P (ijk)P (kjj) = : (2) based clustering approaches demonstrates that our P P (kjv) k k v method can achieve the best or nearly the best clus- tering purity in all tasks. For some datasets, the new This probability defines another similarity between method significantly improves the state-of-the-art. two data nodes, Abij = P (ijj), with respect to clus- ter nodes. Note that this matrix has rank at most After this introductory part, we present the new equal to r. The learning target is now to find a good method in Section2, including its learning objec- approximation between the input similarities and the tive, probabilistic model, optimization and initializa- DCD random walk probabilities: tion techniques. In Section3, we point out the con- nections and differences between our method and other A ≈ A:b (3) recent related work. Experimental settings and results are given in Section4. Finally we conclude the paper AnchorGraph does not provide any error measure for and discuss some future work in Section5. the above approximation. A conventional choice in NMF is the squared Euclidean distance, which em- 2. Clustering by DCD ploys the underlying assumption that the noise is ad- ditive and Gaussian. Suppose the similarities between n data samples are precomputed and given in a nonnegative symmetric In real-world clustering tasks for multivariate datasets, matrix A. This matrix can be seen as (weighted) affin- data points often lie in a curved manifold. Conse- ity of an undirected similarity graph where each node quently, similarities based on Euclidean distances are corresponds to a data sample (data node). A clus- reliable only in a small neighborhood. Such local- tering analysis algorithm takes such input and divides ity causes high sparsity in the input similarity ma- the data nodes into r disjoint subsets. In probabilistic trix. Sparsity also commonly exists for real-world net- clustering analysis, we want to find P (kji), the prob- work data. Because of the sparsity, Euclidean dis- ability of assigning the ith sample to the kth cluster, tance is improper for the approximation in Eq. (3), because additive Gaussian noise should lead to a dense Clustering by Low-Rank Doubly Stochastic Matrix Decomposition observed graph. In contrast, (generalized) Kullback- Although it is possible to construct a multi-level Leibler divergence is more suitable for the approxima- graphical model similar to the Dirichlet process topic tion. The underlying Poisson noise characterizes rare model, we emphasize that the smallest approximation occurrences that are present in our sparse input. We error (or perplexity) is our final goal. Dirichlet prior is can now formulate our learning objective as the fol- used only in order to ease the optimization. Therefore lowing optimization problem: we do not employ more complex generative models; see Section 2.4 for more discussion. ! X Aij min DKL(AjjAb) = Aij log − Aij + Abij W ≥0 2.3. Optimization ij Abij (4) The optimization problem with Dirichlet prior on W X is equivalent to minimizing s.t. Wik = 1; i = 1; : : : ; n; (5) k X X J (W ) = − Aij log Abij − (α − 1) log Wik (9) ij ik where we write Wik = P (kji) for convenience and thus There are two ways to handle the constraint Eq. (5). X WikWjk Abij = P : (6) First, one can develop the multiplicative algorithm by Wvk k v the procedure proposed by Yang & Oja(2011) by ne- glecting the stochasticity constraint, and then normal- ize the rows of W after each update. However, the op- Note that Ab is symmetric as it is easy to verify that timization by this way easily gets stuck in poor local P (ijj) = P (jji).
Recommended publications
  • Eigenvalues of Euclidean Distance Matrices and Rs-Majorization on R2
    Archive of SID 46th Annual Iranian Mathematics Conference 25-28 August 2015 Yazd University 2 Talk Eigenvalues of Euclidean distance matrices and rs-majorization on R pp.: 1{4 Eigenvalues of Euclidean Distance Matrices and rs-majorization on R2 Asma Ilkhanizadeh Manesh∗ Department of Pure Mathematics, Vali-e-Asr University of Rafsanjan Alemeh Sheikh Hoseini Department of Pure Mathematics, Shahid Bahonar University of Kerman Abstract Let D1 and D2 be two Euclidean distance matrices (EDMs) with correspond- ing positive semidefinite matrices B1 and B2 respectively. Suppose that λ(A) = ((λ(A)) )n is the vector of eigenvalues of a matrix A such that (λ(A)) ... i i=1 1 ≥ ≥ (λ(A))n. In this paper, the relation between the eigenvalues of EDMs and those of the 2 corresponding positive semidefinite matrices respect to rs, on R will be investigated. ≺ Keywords: Euclidean distance matrices, Rs-majorization. Mathematics Subject Classification [2010]: 34B15, 76A10 1 Introduction An n n nonnegative and symmetric matrix D = (d2 ) with zero diagonal elements is × ij called a predistance matrix. A predistance matrix D is called Euclidean or a Euclidean distance matrix (EDM) if there exist a positive integer r and a set of n points p1, . , pn r 2 2 { } such that p1, . , pn R and d = pi pj (i, j = 1, . , n), where . denotes the ∈ ij k − k k k usual Euclidean norm. The smallest value of r that satisfies the above condition is called the embedding dimension. As is well known, a predistance matrix D is Euclidean if and 1 1 t only if the matrix B = − P DP with P = I ee , where I is the n n identity matrix, 2 n − n n × and e is the vector of all ones, is positive semidefinite matrix.
    [Show full text]
  • Clustering by Left-Stochastic Matrix Factorization
    Clustering by Left-Stochastic Matrix Factorization Raman Arora [email protected] Maya R. Gupta [email protected] Amol Kapila [email protected] Maryam Fazel [email protected] University of Washington, Seattle, WA 98103, USA Abstract 1.1. Related Work in Matrix Factorization Some clustering objective functions can be written as We propose clustering samples given their matrix factorization objectives. Let n feature vectors d×n pairwise similarities by factorizing the sim- be gathered into a feature-vector matrix X 2 R . T d×k ilarity matrix into the product of a clus- Consider the model X ≈ FG , where F 2 R can ter probability matrix and its transpose. be interpreted as a matrix with k cluster prototypes n×k We propose a rotation-based algorithm to as its columns, and G 2 R is all zeros except for compute this left-stochastic decomposition one (appropriately scaled) positive entry per row that (LSD). Theoretical results link the LSD clus- indicates the nearest cluster prototype. The k-means tering method to a soft kernel k-means clus- clustering objective follows this model with squared tering, give conditions for when the factor- error, and can be expressed as (Ding et al., 2005): ization and clustering are unique, and pro- T 2 arg min kX − FG kF ; (1) vide error bounds. Experimental results on F;GT G=I simulated and real similarity datasets show G≥0 that the proposed method reliably provides accurate clusterings. where k · kF is the Frobenius norm, and inequality G ≥ 0 is component-wise. This follows because the combined constraints G ≥ 0 and GT G = I force each row of G to have only one positive element.
    [Show full text]
  • Arxiv:1306.4805V3 [Math.OC] 6 Feb 2015 Used Greedy Techniques to Reorder Matrices
    CONVEX RELAXATIONS FOR PERMUTATION PROBLEMS FAJWEL FOGEL, RODOLPHE JENATTON, FRANCIS BACH, AND ALEXANDRE D’ASPREMONT ABSTRACT. Seriation seeks to reconstruct a linear order between variables using unsorted, pairwise similarity information. It has direct applications in archeology and shotgun gene sequencing for example. We write seri- ation as an optimization problem by proving the equivalence between the seriation and combinatorial 2-SUM problems on similarity matrices (2-SUM is a quadratic minimization problem over permutations). The seriation problem can be solved exactly by a spectral algorithm in the noiseless case and we derive several convex relax- ations for 2-SUM to improve the robustness of seriation solutions in noisy settings. These convex relaxations also allow us to impose structural constraints on the solution, hence solve semi-supervised seriation problems. We derive new approximation bounds for some of these relaxations and present numerical experiments on archeological data, Markov chains and DNA assembly from shotgun gene sequencing data. 1. INTRODUCTION We study optimization problems written over the set of permutations. While the relaxation techniques discussed in what follows are applicable to a much more general setting, most of the paper is centered on the seriation problem: we are given a similarity matrix between a set of n variables and assume that the variables can be ordered along a chain, where the similarity between variables decreases with their distance within this chain. The seriation problem seeks to reconstruct this linear ordering based on unsorted, possibly noisy, pairwise similarity information. This problem has its roots in archeology [Robinson, 1951] and also has direct applications in e.g.
    [Show full text]
  • Similarity-Based Clustering by Left-Stochastic Matrix Factorization
    JournalofMachineLearningResearch14(2013)1715-1746 Submitted 1/12; Revised 11/12; Published 7/13 Similarity-based Clustering by Left-Stochastic Matrix Factorization Raman Arora [email protected] Toyota Technological Institute 6045 S. Kenwood Ave Chicago, IL 60637, USA Maya R. Gupta [email protected] Google 1225 Charleston Rd Mountain View, CA 94301, USA Amol Kapila [email protected] Maryam Fazel [email protected] Department of Electrical Engineering University of Washington Seattle, WA 98195, USA Editor: Inderjit Dhillon Abstract For similarity-based clustering, we propose modeling the entries of a given similarity matrix as the inner products of the unknown cluster probabilities. To estimate the cluster probabilities from the given similarity matrix, we introduce a left-stochastic non-negative matrix factorization problem. A rotation-based algorithm is proposed for the matrix factorization. Conditions for unique matrix factorizations and clusterings are given, and an error bound is provided. The algorithm is partic- ularly efficient for the case of two clusters, which motivates a hierarchical variant for cases where the number of desired clusters is large. Experiments show that the proposed left-stochastic decom- position clustering model produces relatively high within-cluster similarity on most data sets and can match given class labels, and that the efficient hierarchical variant performs surprisingly well. Keywords: clustering, non-negative matrix factorization, rotation, indefinite kernel, similarity, completely positive 1. Introduction Clustering is important in a broad range of applications, from segmenting customers for more ef- fective advertising, to building codebooks for data compression. Many clustering methods can be interpreted in terms of a matrix factorization problem.
    [Show full text]
  • Doubly Stochastic Matrices Whose Powers Eventually Stop
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Elsevier - Publisher Connector Linear Algebra and its Applications 330 (2001) 25–30 www.elsevier.com/locate/laa Doubly stochastic matrices whose powers eventually stopୋ Suk-Geun Hwang a,∗, Sung-Soo Pyo b aDepartment of Mathematics Education, Kyungpook National University, Taegu 702-701, South Korea bCombinatorial and Computational Mathematics Center, Pohang University of Science and Technology, Pohang, South Korea Received 22 June 2000; accepted 14 November 2000 Submitted by R.A. Brualdi Abstract In this note we characterize doubly stochastic matrices A whose powers A, A2,A3,... + eventually stop, i.e., Ap = Ap 1 =···for some positive integer p. The characterization en- ables us to determine the set of all such matrices. © 2001 Elsevier Science Inc. All rights reserved. AMS classification: 15A51 Keywords: Doubly stochastic matrix; J-potent 1. Introduction Let R denote the real field. For positive integers m, n,letRm×n denote the set of all m × n matrices with real entries. As usual let Rn denote the set Rn×1.We call the members of Rn the n-vectors. The n-vector of 1’s is denoted by e,andthe identity matrix of order n is denoted by In. For two matrices A, B of the same size, let A B denote that all the entries of A − B are nonnegative. A matrix A is called nonnegative if A O. A nonnegative square matrix is called a doubly stochastic matrix if all of its row sums and column sums equal 1.
    [Show full text]
  • Alternating Sign Matrices and Polynomiography
    Alternating Sign Matrices and Polynomiography Bahman Kalantari Department of Computer Science Rutgers University, USA [email protected] Submitted: Apr 10, 2011; Accepted: Oct 15, 2011; Published: Oct 31, 2011 Mathematics Subject Classifications: 00A66, 15B35, 15B51, 30C15 Dedicated to Doron Zeilberger on the occasion of his sixtieth birthday Abstract To each permutation matrix we associate a complex permutation polynomial with roots at lattice points corresponding to the position of the ones. More generally, to an alternating sign matrix (ASM) we associate a complex alternating sign polynomial. On the one hand visualization of these polynomials through polynomiography, in a combinatorial fashion, provides for a rich source of algo- rithmic art-making, interdisciplinary teaching, and even leads to games. On the other hand, this combines a variety of concepts such as symmetry, counting and combinatorics, iteration functions and dynamical systems, giving rise to a source of research topics. More generally, we assign classes of polynomials to matrices in the Birkhoff and ASM polytopes. From the characterization of vertices of these polytopes, and by proving a symmetry-preserving property, we argue that polynomiography of ASMs form building blocks for approximate polynomiography for polynomials corresponding to any given member of these polytopes. To this end we offer an algorithm to express any member of the ASM polytope as a convex of combination of ASMs. In particular, we can give exact or approximate polynomiography for any Latin Square or Sudoku solution. We exhibit some images. Keywords: Alternating Sign Matrices, Polynomial Roots, Newton’s Method, Voronoi Diagram, Doubly Stochastic Matrices, Latin Squares, Linear Programming, Polynomiography 1 Introduction Polynomials are undoubtedly one of the most significant objects in all of mathematics and the sciences, particularly in combinatorics.
    [Show full text]
  • Representations of Stochastic Matrices
    Rotational (and Other) Representations of Stochastic Matrices Steve Alpern1 and V. S. Prasad2 1Department of Mathematics, London School of Economics, London WC2A 2AE, United Kingdom. email: [email protected] 2Department of Mathematics, University of Massachusetts Lowell, Lowell, MA. email: [email protected] May 27, 2005 Abstract Joel E. Cohen (1981) conjectured that any stochastic matrix P = pi;j could be represented by some circle rotation f in the following sense: Forf someg par- tition Si of the circle into sets consisting of …nite unions of arcs, we have (*) f g pi;j = (f (Si) Sj) = (Si), where denotes arc length. In this paper we show how cycle decomposition\ techniques originally used (Alpern, 1983) to establish Cohen’sconjecture can be extended to give a short simple proof of the Coding Theorem, that any mixing (that is, P N > 0 for some N) stochastic matrix P can be represented (in the sense of * but with Si merely measurable) by any aperiodic measure preserving bijection (automorphism) of a Lesbesgue proba- bility space. Representations by pointwise and setwise periodic automorphisms are also established. While this paper is largely expository, all the proofs, and some of the results, are new. Keywords: rotational representation, stochastic matrix, cycle decomposition MSC 2000 subject classi…cations. Primary: 60J10. Secondary: 15A51 1 Introduction An automorphism of a Lebesgue probability space (X; ; ) is a bimeasurable n bijection f : X X which preserves the measure : If S = Si is a non- ! f gi=1 trivial (all (Si) > 0) measurable partition of X; we can generate a stochastic n matrix P = pi;j by the de…nition f gi;j=1 (f (Si) Sj) pi;j = \ ; i; j = 1; : : : ; n: (1) (Si) Since the partition S is non-trivial, the matrix P has a positive invariant (stationary) distribution v = (v1; : : : ; vn) = ( (S1) ; : : : ; (Sn)) ; and hence (by de…nition) is recurrent.
    [Show full text]
  • Left Eigenvector of a Stochastic Matrix
    Advances in Pure Mathematics, 2011, 1, 105-117 doi:10.4236/apm.2011.14023 Published Online July 2011 (http://www.SciRP.org/journal/apm) Left Eigenvector of a Stochastic Matrix Sylvain Lavalle´e Departement de mathematiques, Universite du Quebec a Montreal, Montreal, Canada E-mail: [email protected] Received January 7, 2011; revised June 7, 2011; accepted June 15, 2011 Abstract We determine the left eigenvector of a stochastic matrix M associated to the eigenvalue 1 in the commu- tative and the noncommutative cases. In the commutative case, we see that the eigenvector associated to the eigenvalue 0 is (,,NN1 n ), where Ni is the ith principal minor of NMI= n , where In is the 11 identity matrix of dimension n . In the noncommutative case, this eigenvector is (,P1 ,Pn ), where Pi is the sum in aij of the corresponding labels of nonempty paths starting from i and not passing through i in the complete directed graph associated to M . Keywords: Generic Stochastic Noncommutative Matrix, Commutative Matrix, Left Eigenvector Associated To The Eigenvalue 1, Skew Field, Automata 1. Introduction stochastic free field and that the vector 11 (,,PP1 n ) is fixed by our matrix; moreover, the sum 1 It is well known that 1 is one of the eigenvalue of a of the Pi is equal to 1, hence they form a kind of stochastic matrix (i.e. the sum of the elements of each noncommutative limiting probability. row is equal to 1) and its associated right eigenvector is These results have been proved in [1] but the proof the vector (1,1, ,1)T .
    [Show full text]
  • Alternating Sign Matrices, Extensions and Related Cones
    See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/311671190 Alternating sign matrices, extensions and related cones Article in Advances in Applied Mathematics · May 2017 DOI: 10.1016/j.aam.2016.12.001 CITATIONS READS 0 29 2 authors: Richard A. Brualdi Geir Dahl University of Wisconsin–Madison University of Oslo 252 PUBLICATIONS 3,815 CITATIONS 102 PUBLICATIONS 1,032 CITATIONS SEE PROFILE SEE PROFILE Some of the authors of this publication are also working on these related projects: Combinatorial matrix theory; alternating sign matrices View project All content following this page was uploaded by Geir Dahl on 16 December 2016. The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document and are linked to publications on ResearchGate, letting you access and read them immediately. Alternating sign matrices, extensions and related cones Richard A. Brualdi∗ Geir Dahly December 1, 2016 Abstract An alternating sign matrix, or ASM, is a (0; ±1)-matrix where the nonzero entries in each row and column alternate in sign, and where each row and column sum is 1. We study the convex cone generated by ASMs of order n, called the ASM cone, as well as several related cones and polytopes. Some decomposition results are shown, and we find a minimal Hilbert basis of the ASM cone. The notion of (±1)-doubly stochastic matrices and a generalization of ASMs are introduced and various properties are shown. For instance, we give a new short proof of the linear characterization of the ASM polytope, in fact for a more general polytope.
    [Show full text]
  • Contents 5 Eigenvalues and Diagonalization
    Linear Algebra (part 5): Eigenvalues and Diagonalization (by Evan Dummit, 2017, v. 1.50) Contents 5 Eigenvalues and Diagonalization 1 5.1 Eigenvalues, Eigenvectors, and The Characteristic Polynomial . 1 5.1.1 Eigenvalues and Eigenvectors . 2 5.1.2 Eigenvalues and Eigenvectors of Matrices . 3 5.1.3 Eigenspaces . 6 5.2 Diagonalization . 9 5.3 Applications of Diagonalization . 14 5.3.1 Transition Matrices and Incidence Matrices . 14 5.3.2 Systems of Linear Dierential Equations . 16 5.3.3 Non-Diagonalizable Matrices and the Jordan Canonical Form . 19 5 Eigenvalues and Diagonalization In this chapter, we will discuss eigenvalues and eigenvectors: these are characteristic values (and characteristic vectors) associated to a linear operator T : V ! V that will allow us to study T in a particularly convenient way. Our ultimate goal is to describe methods for nding a basis for V such that the associated matrix for T has an especially simple form. We will rst describe diagonalization, the procedure for (trying to) nd a basis such that the associated matrix for T is a diagonal matrix, and characterize the linear operators that are diagonalizable. Then we will discuss a few applications of diagonalization, including the Cayley-Hamilton theorem that any matrix satises its characteristic polynomial, and close with a brief discussion of non-diagonalizable matrices. 5.1 Eigenvalues, Eigenvectors, and The Characteristic Polynomial • Suppose that we have a linear transformation T : V ! V from a (nite-dimensional) vector space V to itself. We would like to determine whether there exists a basis of such that the associated matrix β is a β V [T ]β diagonal matrix.
    [Show full text]
  • Markov Chains
    Stochastic Matrices The following 3 3matrixdefinesa × discrete time Markov process with three states: P11 P12 P13 P = P21 P22 P23 ⎡ ⎤ P31 P32 P33 ⎣ ⎦ where Pij is the probability of going from j i in one step. A stochastic matrix → satisfies the following conditions: P 0 ∀i, j ij ≥ and M j ∑ Pij = 1. ∀ i=1 Example The following 3 3matrixdefinesa × discrete time Markov process with three states: 0.90 0.01 0.09 P = 0.01 0.90 0.01 ⎡ ⎤ 0.09 0.09 0.90 ⎣ ⎦ where P23 = 0.01 is the probability of going from 3 2inonestep.Youcan → verify that P 0 ∀i, j ij ≥ and 3 j ∑ Pij = 1. ∀ i=1 Example (contd.) 0.9 0.9 0.01 1 2 0.01 0.09 0.09 0.01 0.09 3 0.9 Figure 1: Three-state Markov process. Single Step Transition Probabilities x(1) = Px(0) x(2) = Px(1) . x(t+1) = Px(t) 0.641 0.90 0.01 0.09 0.7 0.188 = 0.01 0.90 0.01 0.2 ⎡ ⎤ ⎡ ⎤⎡ ⎤ 0.171 0.09 0.09 0.90 0.1 ⎣ x(t+1) ⎦ ⎣ P ⎦⎣ x(t) ⎦ M % &' ( (t%+1) &' (t) (% &' ( xi = ∑ Pij x j j=1 n-step Transition Probabilities Observe that x(3) can be written as fol- lows: x(3) = Px(2) = P Px(1) = P)P Px*(0) = P3)x(0)). ** n-step Transition Probabilities (contd.) Similar logic leads us to an expression for x(n): x(n) = P P... Px(0) ) )n ** = Pnx(0). % &' ( An n-step transition probability matrix can be defined in terms of a single step matrix and a (n 1)-step matrix: − M ( n) = n 1 .
    [Show full text]
  • Notes on Birkhoff-Von Neumann Decomposition of Doubly Stochastic Matrices Fanny Dufossé, Bora Uçar
    Notes on Birkhoff-von Neumann decomposition of doubly stochastic matrices Fanny Dufossé, Bora Uçar To cite this version: Fanny Dufossé, Bora Uçar. Notes on Birkhoff-von Neumann decomposition of doubly stochastic matri- ces. Linear Algebra and its Applications, Elsevier, 2016, 497, pp.108–115. 10.1016/j.laa.2016.02.023. hal-01270331v6 HAL Id: hal-01270331 https://hal.inria.fr/hal-01270331v6 Submitted on 23 Apr 2016 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Notes on Birkhoff-von Neumann decomposition of doubly stochastic matrices Fanny Dufoss´ea, Bora U¸carb,∗ aInria Lille, Nord Europe, 59650, Villeneuve d'Ascq, France bLIP, UMR5668 (CNRS - ENS Lyon - UCBL - Universit´ede Lyon - INRIA), Lyon, France Abstract Birkhoff-von Neumann (BvN) decomposition of doubly stochastic matrices ex- presses a double stochastic matrix as a convex combination of a number of permutation matrices. There are known upper and lower bounds for the num- ber of permutation matrices that take part in the BvN decomposition of a given doubly stochastic matrix. We investigate the problem of computing a decom- position with the minimum number of permutation matrices and show that the associated decision problem is strongly NP-complete.
    [Show full text]