Learning on Hypergraphs: Spectral Theory and Clustering

Total Page:16

File Type:pdf, Size:1020Kb

Learning on Hypergraphs: Spectral Theory and Clustering Learning on Hypergraphs: Spectral Theory and Clustering Pan Li, Olgica Milenkovic Coordinated Science Laboratory University of Illinois at Urbana-Champaign March 12, 2019 Learning on Graphs Graphs are indispensable mathematical data models capturing pairwise interactions: k-nn network social network publication network Important learning on graphs problems: clustering (community detection), semi-supervised/active learning, representation learning (graph embedding) etc. Beyond Pairwise Relations A graph models pairwise relations. Recent work has shown that high-order relations can be significantly more informative: Examples include: Understanding the organization of networks (Benson, Gleich and Leskovec'16) Determining the topological connectivity between data points (Zhou, Huang, Sch}olkopf'07). Graphs with high-order relations can be modeled as hypergraphs (formally defined later). Meta-graphs, meta-paths in heterogeneous information networks. Algorithmic methods for analyzing high-order relations and learning problems are still under development. Beyond Pairwise Relations Functional units in social and biological networks. High-order network motifs: Motif (Benson’16) Microfauna Pelagic fishes Crabs & Benthic fishes Macroinvertebrates Algorithmic methods for analyzing high-order relations and learning problems are still under development. Beyond Pairwise Relations Functional units in social and biological networks. Meta-graphs, meta-paths in heterogeneous information networks. (Zhou, Yu, Han'11) Beyond Pairwise Relations Functional units in social and biological networks. Meta-graphs, meta-paths in heterogeneous information networks. Algorithmic methods for analyzing high-order relations and learning problems are still under development. Review of Graph Clustering: Notation and Terminology Graph Clustering Task: Cluster the vertices that are \densely" connected by edges. Graph Partitioning and Conductance A (weighted) graph G = (V ; E; w): for e 2 E, we is the weight. Partition V into two sets: V = S [ S¯. ¯ Boundary of set S: @S , fe 2 Eje \ S 6= ;; e \ S 6= ;g. Edge e is cut by (S; S¯) if e 2 @S. Total cut cost: X Cut(S) = we : e2@S Volume of set S: X X X Vol(S) = dv = we : v2S v2S e:v2e Conductance of a set: Cut(S) (S) = : minfVol(S); Vol(S¯)g Spectral Clustering It is well-known that sets with small conductance correspond to high-quality clusters: community detection in real networks [YL2012], image segmentation [SM2002], etc. Cut(S) min (S); (S) , : S minfVol(S); Vol(S¯)g Algorithmic method of choice: spectral clustering Spectral Graph Partitioning Input: The adjacency matrix A and diagonal degree matrix D. Step 1: Compute the normalized Laplacian L = I − D−1=2AD−1=2. T Step 2: Compute the eigenvector u = (u1; u2; :::; un) corresponding to the second smallest eigenvalue of L. Step 3: Partition the set of vertices according to D−1=2u. Performance Guarantees for Spectral Clustering ^: the conductance obtained via spectral graph partitioning; ∗: the optimal conductance (Cheeger constant); ^ ≤ 2p ∗ [C97] A direct consequence of Cheeger's inequality: If λ is the second smallest eigenvalue of L, then ( ∗)2 λ ≤ ≤ ∗: 4 2 From Graphs to Hypergraphs Each hyperedge e is associated with weight we e we : 2 ! R≥0; we (;) = 0; we (S) = we (e=S): we (S) describes the "contribution" of the set S to the high-order relation e. If for all S 62 f;; eg, we (S) = const:, then we obtain standard hypergraphs. Hypergraphs and Inhomogeneous hypergraphs Formal definition: A hypergraph is an ordered pair G = (V ; E), where V is the vertex set, and E comprises hyperedges e ⊆ V s.t. jej ≥ 2. 1 6 2 7 5 3 8 4 10 9 3-uniform hypergraph: jej = 3 Hypergraphs and Inhomogeneous hypergraphs Formal definition: A hypergraph is an ordered pair G = (V ; E), where V is the vertex set, and E comprises hyperedges e ⊆ V s.t. jej ≥ 2. Each hyperedge e is associated with weight we e we : 2 ! R≥0; we (;) = 0; we (S) = we (e=S): we (S) describes the "contribution" of the set S to the high-order relation e. If for all S 62 f;; eg, we (S) = const:, then we obtain standard hypergraphs. A we we(A) A we we(B) we B C we(C) B C homogenous Inhomogenous hyperedges (Metabolicinhomogenous Networks Example) Inhomogenous Hypergraph Partitioning: Two Clusters Inhomogeneous weights: e we : 2 ! R≥0; we (;) = 0; we (S) = we (e=S): Cost of cut (inhomogeneous case): X Cut(S) = we (S \ e): e2@S Volume of set S: X Vol(S) = dv ; v2S P where d : V ! R≥0 is the \degree" dv = e:v2e kwe k1. Objective Inhomogenous Hypergraph Partitioning: Minimize the conductance Cut(S) min (S); (S) , : S maxfVol(S); Vol(S¯)g New Algorithms for Inhomogenous Hypergraph Partitioning Major technical challenge: there is no matrix form for the Laplacian(s) of (inhomogeneous) hypergraphs. Two variants of spectral clustering: \Matrix approximation" of Laplacian (projection) + Standard graph spectral clustering; Pros: Efficient algorithms; Cons: Large distortion. Nonlinear Laplacian spectrum approximation; Pros: Small distortion; Cons: Potentially large complexity. \Matrix approximation" of Laplacian (projection) + Standard graph spectral clustering Projection-based Methods The algorithm essentially follows a 3-step framework: Spectral Hypergraph Partitioning Step 1: Project each hyperedge onto a weighted clique. Step 2: Merge the \projected cliques" into one graph. Step 3: Perform classical spectral graph partitioning based on the normalized Laplacian. Projection-based Methods An example of projection-based methods for constant-weight hypergraphs [ZHS'07, BGL'16] : (e) Projection: wvv~ = we =jej; 1 1 2 2 2 55 55 1 P (e) 2 Merging: wvv~ , e2E wvv~ ; 5 1 3 1 2 1 4 2 2 1 5 2 3 1 2 4 Spectral graph partitioning. Distortion Caused by Projection Consider a constant-weight hyperedge with size jej: weights in the projected clique are uniform. |e| - 1 |e|2/4 Projection may induce the distortion even for constant-weight hyperedges. Projection for Inhomogeneous Hyperedges Key question: How does one project inhomogeneous hyperedges onto cliques, with optimal cost approximation? min β(e) w (e) X (e) (e) s.t. we (S) ≤ wvv~ ≤ β we (S); v2S;v~2e=S e for all S 2 2 for which we (S) is defined, (e) where fwvv~ gv;v~2e stand for projected edge weights. O(2jej) constraints: LP may be complex; the problem may be infeasible. Problems: 1) There may be no feasible β(e) values for some hyperedges e. 2) One may have wvv~ < 0 for some v; v~. Empirically, wvv~ = (wvv~)+ appears to work well, but no general analysis available. Theoretical Performance Guarantees β(e) is the approximation ratio when projecting e into a clique. Theorem If there exist feasible constants β(e) for all hyperedges e and wvv~ ≥ 0 for all fv; v~g, then the underlying ^ satisfies ∗ p ^ ≤ 2(β ) ∗; ∗ ∗ (e) where is the Cheeger constant, β = maxe2E β . Theoretical Performance Guarantees β(e) is the approximation ratio when projecting e into a clique. Theorem If there exist feasible constants β(e) for all hyperedges e and wvv~ ≥ 0 for all fv; v~g, then the underlying ^ satisfies ∗ p ^ ≤ 2(β ) ∗; ∗ ∗ (e) where is the Cheeger constant, β = maxe2E β . Problems: 1) There may be no feasible β(e) values for some hyperedges e. 2) One may have wvv~ < 0 for some v; v~. Empirically, wvv~ = (wvv~)+ appears to work well, but no general analysis available. Submodular Weights Sufficient conditions to approximate a set-partition function we (S) by a graph-cut function? Submodular costs we (S)! Definition e ≥0 A function we : 2 ! R that satisfies e we (S1) + we (S2) ≥ we (S1 \ S2) + we (S1 [ S2) for all S1; S2 2 2 ; is referred to as submodular. Performance Guarantees under Submodular Costs Theorem A symmetric submodular function we (·) with we (;) = 0 has a constant graph-cut approximation function with nonnegative (e) weights fwvv 0 gv;v 02e . Specific Questions Is there a simple algorithm that avoids solving the optimization problem? (e) Constrain wvv~ = fvv~(we ), where fvv~(·) is linear. Finding the optimal β becomes a min-max problem: min max β ffvv~gv;v~2e submodular we X V s.t. we (S) ≤ fvv~(we (S)) ≤ β we (S); for S 2 2 . v2S;v~2e=S fvv~(·) does not depend on we but may depend on jej. Theoretical Performance Guarantees Theorem The min-max optimal solution (linear, 2 ≤ jej ≤ 7) satisfies: ∗(e) X we (S) w = 1 vv~ 2jSj(jej − jSj) jfv;v~g\Sj=1 S22e =f;;eg w (S) − e 1 2(jSj + 1)(jej − jSj − 1) jfv;v~g\Sj=0 w (S) − e 1 ; 2(jSj − 1)(jej − jSj + 1) jfv;v~g\Sj=2 jej 2 3 4 5 6 7 with constants β(e): β 1 1 3=2 2 4 6 Conjecture: The claim holds for all jej.(Proof?) Applications of Inhomogeneous Hypergraph Clustering in Foodweb Hierarchical Clustering, Category Learning in Rankings and Subspace Segmentation Application 1: Foodweb Hierarchical Clustering 3 2 Producers 1 0 Primary Consumers -1 … -2 High-level Consumers -3 Hierarchical communities in the Florida -4 Florida Bay food web Bay food web. -2 -1 0 1 2 3 4 5 Foodweb Hierarchical Clustering I 3 2 1 If two species share the same 0 food source, they are in the same niche. -1 If two species share the same predators, they are in the same -2 niche. -3 Florida Bay food web -4 -2 -1 0 1 2 3 4 5 Foodweb Hierarchical Clustering II Motif of interest: v1 v 3 we (fvi g) = 1 for i = 1; 2; 3; 4 we (fv1; v3g) = 2; and we (fv1; v4g) = 2 w (fv ; v g) = 0 v2 v4 e 1 2 The clustering result: Secondary consumers Primary consumers Producers Invertebrates Forage fishes Predatory fishes & Birds Top-level Predators Only 5 links in reverse directions.
Recommended publications
  • Three Conjectures in Extremal Spectral Graph Theory
    Three conjectures in extremal spectral graph theory Michael Tait and Josh Tobin June 6, 2016 Abstract We prove three conjectures regarding the maximization of spectral invariants over certain families of graphs. Our most difficult result is that the join of P2 and Pn−2 is the unique graph of maximum spectral radius over all planar graphs. This was conjectured by Boots and Royle in 1991 and independently by Cao and Vince in 1993. Similarly, we prove a conjecture of Cvetkovi´cand Rowlinson from 1990 stating that the unique outerplanar graph of maximum spectral radius is the join of a vertex and Pn−1. Finally, we prove a conjecture of Aouchiche et al from 2008 stating that a pineapple graph is the unique connected graph maximizing the spectral radius minus the average degree. To prove our theorems, we use the leading eigenvector of a purported extremal graph to deduce structural properties about that graph. Using this setup, we give short proofs of several old results: Mantel's Theorem, Stanley's edge bound and extensions, the K}ovari-S´os-Tur´anTheorem applied to ex (n; K2;t), and a partial solution to an old problem of Erd}oson making a triangle-free graph bipartite. 1 Introduction Questions in extremal graph theory ask to maximize or minimize a graph invariant over a fixed family of graphs. Perhaps the most well-studied problems in this area are Tur´an-type problems, which ask to maximize the number of edges in a graph which does not contain fixed forbidden subgraphs. Over a century old, a quintessential example of this kind of result is Mantel's theorem, which states that Kdn=2e;bn=2c is the unique graph maximizing the number of edges over all triangle-free graphs.
    [Show full text]
  • Manifold Learning
    MANIFOLD LEARNING Kavé Salamatian- LISTIC Learning Machine Learning: develop algorithms to automatically extract ”patterns” or ”regularities” from data (generalization) Typical tasks Clustering: Find groups of similar points Dimensionality reduction: Project points in a lower dimensional space while preserving structure Semi-supervised: Given labelled and unlabelled points, build a labelling function Supervised: Given labelled points, build a labelling function All these tasks are not well-defined Clustering Identify the two groups Dimensionality Reduction 4096 Objects in R ? But only 3 parameters: 2 angles and 1 for illumination. They span a 3- dimensional sub-manifold of R4096! Semi-Supervised Learning Assign labels to black points Supervised learning Build a function that predicts the label of all points in the space Formal definitions Clustering: Given build a function Dimensionality reduction: Given build a function Semi-supervised: Given and with m << n, build a function Supervised: Given , build Geometric learning Consider and exploit the geometric structure of the data Clustering: two ”close” points should be in the same cluster Dimensionality reduction: ”closeness” should be preserved Semi-supervised/supervised: two ”close” points should have the same label Examples: k-means clustering, k-nearest neighbors. Manifold ? A manifold is a topological space that is locally Euclidian Around every point, there is a neighborhood that is topologically isomorph to unit ball Regularization Adopt a functional viewpoint
    [Show full text]
  • Segment-Based Clustering of Hyperspectral Images Using Tree-Based Data Partitioning Structures
    algorithms Article Segment-Based Clustering of Hyperspectral Images Using Tree-Based Data Partitioning Structures Mohamed Ismail and Milica Orlandi´c* Department of Electronic Systems , NTNU: Norwegian University of Science and Technology (NTNU), 7491 Trondheim, Norway; [email protected] * Correspondence: [email protected] Received: 2 October 2020; Accepted: 8 December 2020; Published: 10 December 2020 Abstract: Hyperspectral image classification has been increasingly used in the field of remote sensing. In this study, a new clustering framework for large-scale hyperspectral image (HSI) classification is proposed. The proposed four-step classification scheme explores how to effectively use the global spectral information and local spatial structure of hyperspectral data for HSI classification. Initially, multidimensional Watershed is used for pre-segmentation. Region-based hierarchical hyperspectral image segmentation is based on the construction of Binary partition trees (BPT). Each segmented region is modeled while using first-order parametric modelling, which is then followed by a region merging stage using HSI regional spectral properties in order to obtain a BPT representation. The tree is then pruned to obtain a more compact representation. In addition, principal component analysis (PCA) is utilized for HSI feature extraction, so that the extracted features are further incorporated into the BPT. Finally, an efficient variant of k-means clustering algorithm, called filtering algorithm, is deployed on the created BPT structure, producing the final cluster map. The proposed method is tested over eight publicly available hyperspectral scenes with ground truth data and it is further compared with other clustering frameworks. The extensive experimental analysis demonstrates the efficacy of the proposed method.
    [Show full text]
  • Image Segmentation Based on Multiscale Fast Spectral Clustering Chongyang Zhang, Guofeng Zhu, Minxin Chen, Hong Chen, Chenjian Wu
    IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. XX, NO. X, XX 2018 1 Image Segmentation Based on Multiscale Fast Spectral Clustering Chongyang Zhang, Guofeng Zhu, Minxin Chen, Hong Chen, Chenjian Wu Abstract—In recent years, spectral clustering has become one In recent years, researchers have proposed various ap- of the most popular clustering algorithms for image segmenta- proaches to large-scale image segmentation. The approaches tion. However, it has restricted applicability to large-scale images are based on three following strategies: constructing a sparse due to its high computational complexity. In this paper, we first propose a novel algorithm called Fast Spectral Clustering similarity matrix, using Nystrom¨ approximation and using based on quad-tree decomposition. The algorithm focuses on representative points. The following approaches are based the spectral clustering at superpixel level and its computational on constructing a sparse similarity matrix. In 2000, Shi and 3 complexity is O(n log n) + O(m) + O(m 2 ); its memory cost Malik [18] constructed the similarity matrix of the image by is O(m), where n and m are the numbers of pixels and the using the k-nearest neighbor sparse strategy to reduce the superpixels of a image. Then we propose Multiscale Fast Spectral complexity of constructing the similarity matrix to O(n) and Clustering by improving Fast Spectral Clustering, which is based to reduce the complexity of eigen-decomposing the Laplacian on the hierarchical structure of the quad-tree. The computational 3 complexity of Multiscale Fast Spectral Clustering is O(n log n) matrix to O(n 2 ) by using the Lanczos algorithm.
    [Show full text]
  • Contemporary Spectral Graph Theory
    CONTEMPORARY SPECTRAL GRAPH THEORY GUEST EDITORS Maurizio Brunetti, University of Naples Federico II, Italy, [email protected] Raffaella Mulas, The Alan Turing Institute, UK, [email protected] Lucas Rusnak, Texas State University, USA, [email protected] Zoran Stanić, University of Belgrade, Serbia, [email protected] DESCRIPTION Spectral graph theory is a striking theory that studies the properties of a graph in relationship to the characteristic polynomial, eigenvalues and eigenvectors of matrices associated with the graph, such as the adjacency matrix, the Laplacian matrix, the signless Laplacian matrix or the distance matrix. This theory emerged in the 1950s and 1960s. Over the decades it has considered not only simple undirected graphs but also their generalizations such as multigraphs, directed graphs, weighted graphs or hypergraphs. In the recent years, spectra of signed graphs and gain graphs have attracted a great deal of attention. Besides graph theoretic research, another major source is the research in a wide branch of applications in chemistry, physics, biology, electrical engineering, social sciences, computer sciences, information theory and other disciplines. This thematic special issue is devoted to the publication of original contributions focusing on the spectra of graphs or their generalizations, along with the most recent applications. Contributions to the Special Issue may address (but are not limited) to the following aspects: f relations between spectra of graphs (or their generalizations) and their structural properties in the most general shape, f bounds for particular eigenvalues, f spectra of particular types of graph, f relations with block designs, f particular eigenvalues of graphs, f cospectrality of graphs, f graph products and their eigenvalues, f graphs with a comparatively small number of eigenvalues, f graphs with particular spectral properties, f applications, f characteristic polynomials.
    [Show full text]
  • Multiplex Conductance and Gossip Based Information Spreading in Multiplex Networks
    1 Multiplex Conductance and Gossip Based Information Spreading in Multiplex Networks Yufan Huang, Student Member, IEEE, and Huaiyu Dai, Fellow, IEEE Abstract—In this network era, not only people are connected, different networks are also coupled through various interconnections. This kind of network of networks, or multilayer networks, has attracted research interest recently, and many beneficial features have been discovered. However, quantitative study of information spreading in such networks is essentially lacking. Despite some existing results in single networks, the layer heterogeneity and complicated interconnections among the layers make the study of information spreading in this type of networks challenging. In this work, we study the information spreading time in multiplex networks, adopting the gossip (random-walk) based information spreading model. A new metric called multiplex conductance is defined based on the multiplex network structure and used to quantify the information spreading time in a general multiplex network in the idealized setting. Multiplex conductance is then evaluated for some interesting multiplex networks to facilitate understanding in this new area. Finally, the tradeoff between the information spreading efficiency improvement and the layer cost is examined to explain the user’s social behavior and motivate effective multiplex network designs. Index Terms—Information spreading, multiplex networks, gossip algorithm, multiplex conductance F 1 INTRODUCTION N the election year, one of the most important tasks Arguably, this somewhat simplified version of multilayer I for presidential candidates is to disseminate their words networks already captures many interesting multi-scale and and opinions to voters in a fast and efficient manner. multi-component features, and serves as a good starting The underlying research problem on information spreading point for our intended study.
    [Show full text]
  • Fast Approximate Spectral Clustering
    Fast Approximate Spectral Clustering Donghui Yan Ling Huang Michael I. Jordan University of California Intel Research Lab University of California Berkeley, CA 94720 Berkeley, CA 94704 Berkeley, CA 94720 [email protected] [email protected] [email protected] ABSTRACT variety of methods have been developed over the past several decades Spectral clustering refers to a flexible class of clustering proce- to solve clustering problems [15, 19]. A relatively recent area dures that can produce high-quality clusterings on small data sets of focus has been spectral clustering, a class of methods based but which has limited applicability to large-scale problems due to on eigendecompositions of affinity, dissimilarity or kernel matri- its computational complexity of O(n3) in general, with n the num- ces [20, 28, 31]. Whereas many clustering methods are strongly ber of data points. We extend the range of spectral clustering by tied to Euclidean geometry, making explicit or implicit assump- developing a general framework for fast approximate spectral clus- tions that clusters form convex regions in Euclidean space, spectral tering in which a distortion-minimizing local transformation is first methods are more flexible, capturing a wider range of geometries. applied to the data. This framework is based on a theoretical anal- They often yield superior empirical performance when compared ysis that provides a statistical characterization of the effect of local to competing algorithms such as k-means, and they have been suc- distortion on the mis-clustering rate. We develop two concrete in- cessfully deployed in numerous applications in areas such as com- stances of our general framework, one based on local k-means clus- puter vision, bioinformatics, and robotics.
    [Show full text]
  • Spectral Geometry for Structural Pattern Recognition
    Spectral Geometry for Structural Pattern Recognition HEWAYDA EL GHAWALBY Ph.D. Thesis This thesis is submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy. Department of Computer Science United Kingdom May 2011 Abstract Graphs are used pervasively in computer science as representations of data with a network or relational structure, where the graph structure provides a flexible representation such that there is no fixed dimensionality for objects. However, the analysis of data in this form has proved an elusive problem; for instance, it suffers from the robustness to structural noise. One way to circumvent this problem is to embed the nodes of a graph in a vector space and to study the properties of the point distribution that results from the embedding. This is a problem that arises in a number of areas including manifold learning theory and graph-drawing. In this thesis, our first contribution is to investigate the heat kernel embed- ding as a route to computing geometric characterisations of graphs. The reason for turning to the heat kernel is that it encapsulates information concerning the distribution of path lengths and hence node affinities on the graph. The heat ker- nel of the graph is found by exponentiating the Laplacian eigensystem over time. The matrix of embedding co-ordinates for the nodes of the graph is obtained by performing a Young-Householder decomposition on the heat kernel. Once the embedding of its nodes is to hand we proceed to characterise a graph in a geometric manner. With the embeddings to hand, we establish a graph character- ization based on differential geometry by computing sets of curvatures associated ii Abstract iii with the graph nodes, edges and triangular faces.
    [Show full text]
  • CS 598: Spectral Graph Theory. Lecture 10
    CS 598: Spectral Graph Theory. Lecture 13 Expander Codes Alexandra Kolla Today Graph approximations, part 2. Bipartite expanders as approximations of the bipartite complete graph Quasi-random properties of bipartite expanders, expander mixing lemma Building expander codes Encoding, minimum distance, decoding Expander Graph Refresher We defined expander graphs to be d- regular graphs whose adjacency matrix eigenvalues satisfy |훼푖| ≤ 휖푑 for i>1, and some small 휖. We saw that such graphs are very good approximations of the complete graph. Expanders as Approximations of the Complete Graph Refresher Let G be a d-regular graph whose adjacency eigenvalues satisfy |훼푖| ≤ 휖푑. As its Laplacian eigenvalues satisfy 휆푖 = 푑 − 훼푖, all non-zero eigenvalues are between (1 − ϵ)d and 1 + ϵ 푑. 푇 푇 Let H=(d/n)Kn, 푠표 푥 퐿퐻푥 = 푑푥 푥 We showed that G is an ϵ − approximation of H 푇 푇 푇 1 − ϵ 푥 퐿퐻푥 ≼ 푥 퐿퐺푥 ≼ 1 + ϵ 푥 퐿퐻푥 Bipartite Expander Graphs Define them as d-regular good approximations of (some multiple of) the complete bipartite 푑 graph H= 퐾 : 푛 푛,푛 푇 푇 푇 1 − ϵ 푥 퐿퐻푥 ≼ 푥 퐿퐺푥 ≼ 1 + ϵ 푥 퐿퐻푥 ⇒ 1 − ϵ 퐻 ≼ 퐺 ≼ 1 + ϵ 퐻 G 퐾푛,푛 U V U V The Spectrum of Bipartite Expanders 푑 Eigenvalues of Laplacian of H= 퐾 are 푛 푛,푛 휆1 = 0, 휆2푛 = 2푑, 휆푖 = 푑, 0 < 푖 < 2푛 푇 푇 푇 1 − ϵ 푥 퐿퐻푥 ≼ 푥 퐿퐺푥 ≼ 1 + ϵ 푥 퐿퐻푥 Says we want d-regular graph G that has evals 휆1 = 0, 휆2푛 = 2푑, 휆푖 ∈ ( 1 − 휖 푑, 1 + 휖 푑), 0 < 푖 < 2푛 Construction of Bipartite Expanders Definition (Double Cover).
    [Show full text]
  • An Algorithmic Introduction to Clustering
    AN ALGORITHMIC INTRODUCTION TO CLUSTERING APREPRINT Bernardo Gonzalez Department of Computer Science and Engineering UCSC [email protected] June 11, 2020 The purpose of this document is to provide an easy introductory guide to clustering algorithms. Basic knowledge and exposure to probability (random variables, conditional probability, Bayes’ theorem, independence, Gaussian distribution), matrix calculus (matrix and vector derivatives), linear algebra and analysis of algorithm (specifically, time complexity analysis) is assumed. Starting with Gaussian Mixture Models (GMM), this guide will visit different algorithms like the well-known k-means, DBSCAN and Spectral Clustering (SC) algorithms, in a connected and hopefully understandable way for the reader. The first three sections (Introduction, GMM and k-means) are based on [1]. The fourth section (SC) is based on [2] and [5]. Fifth section (DBSCAN) is based on [6]. Sixth section (Mean Shift) is, to the best of author’s knowledge, original work. Seventh section is dedicated to conclusions and future work. Traditionally, these five algorithms are considered completely unrelated and they are considered members of different families of clustering algorithms: • Model-based algorithms: the data are viewed as coming from a mixture of probability distributions, each of which represents a different cluster. A Gaussian Mixture Model is considered a member of this family • Centroid-based algorithms: any data point in a cluster is represented by the central vector of that cluster, which need not be a part of the dataset taken. k-means is considered a member of this family • Graph-based algorithms: the data is represented using a graph, and the clustering procedure leverage Graph theory tools to create a partition of this graph.
    [Show full text]
  • Spectral Clustering: a Tutorial for the 2010’S
    Spectral Clustering: a Tutorial for the 2010’s Marina Meila∗ Department of Statistics, University of Washington September 22, 2016 Abstract Spectral clustering is a family of methods to find K clusters using the eigenvectors of a matrix. Typically, this matrix is derived from a set of pairwise similarities Sij between the points to be clustered. This task is called similarity based clustering, graph clustering, or clustering of diadic data. One remarkable advantage of spectral clustering is its ability to cluster “points” which are not necessarily vectors, and to use for this a“similarity”, which is less restric- tive than a distance. A second advantage of spectral clustering is its flexibility; it can find clusters of arbitrary shapes, under realistic separations. This chapter introduces the similarity based clustering paradigm, describes the al- gorithms used, and sets the foundations for understanding these algorithms. Practical aspects, such as obtaining the similarities are also discussed. ∗ This tutorial appeared in “Handbook of Cluster ANalysis” by Christian Hennig, Marina Meila, Fionn Murthagh and Roberto Rocci (eds.), CRC Press, 2015. 1 Contents 1 Similarity based clustering. Definitions and criteria 3 1.1 What is similarity based clustering? . ...... 3 1.2 Similarity based clustering and cuts in graphs . ......... 3 1.3 The Laplacian and other matrices of spectral clustering ........... 4 1.4 Four bird’s eye views of spectral clustering . ......... 6 2 Spectral clustering algorithms 7 3 Understanding the spectral clustering algorithms 11 3.1 Random walk/Markov chain view . 11 3.2 Spectral clustering as finding a small balanced cut in ........... 13 G 3.3 Spectral clustering as finding smooth embeddings .
    [Show full text]
  • Notes on Elementary Spectral Graph Theory Applications to Graph Clustering Using Normalized Cuts
    University of Pennsylvania ScholarlyCommons Technical Reports (CIS) Department of Computer & Information Science 1-1-2013 Notes on Elementary Spectral Graph Theory Applications to Graph Clustering Using Normalized Cuts Jean H. Gallier University of Pennsylvania, [email protected] Follow this and additional works at: https://repository.upenn.edu/cis_reports Part of the Computer Engineering Commons, and the Computer Sciences Commons Recommended Citation Jean H. Gallier, "Notes on Elementary Spectral Graph Theory Applications to Graph Clustering Using Normalized Cuts", . January 2013. University of Pennsylvania Department of Computer and Information Science Technical Report No. MS-CIS-13-09. This paper is posted at ScholarlyCommons. https://repository.upenn.edu/cis_reports/986 For more information, please contact [email protected]. Notes on Elementary Spectral Graph Theory Applications to Graph Clustering Using Normalized Cuts Abstract These are notes on the method of normalized graph cuts and its applications to graph clustering. I provide a fairly thorough treatment of this deeply original method due to Shi and Malik, including complete proofs. I include the necessary background on graphs and graph Laplacians. I then explain in detail how the eigenvectors of the graph Laplacian can be used to draw a graph. This is an attractive application of graph Laplacians. The main thrust of this paper is the method of normalized cuts. I give a detailed account for K = 2 clusters, and also for K > 2 clusters, based on the work of Yu and Shi. Three points that do not appear to have been clearly articulated before are elaborated: 1. The solutions of the main optimization problem should be viewed as tuples in the K-fold cartesian product of projective space RP^{N-1}.
    [Show full text]