Visualizing Graphs and Clusters As Maps
Total Page:16
File Type:pdf, Size:1020Kb
Visualizing Graphs and Clusters as Maps Emden R. Gansner∗ Yifan Hu† AT&T Labs - Research, 180 Park Ave, Florham Park, NJ 07932 AT&T Labs - Research, 180 Park Ave, Florham Park, NJ 07932 Stephen G. Kobourov‡ University of Arizona, 1040 E 4th Street, Tucson, AZ 85721 ABSTRACT to understand the visualization of a biological experiment that was Information visualization is essential in making sense out of large created by a statistician. data sets. Often, high-dimensional data are visualized as a collec- When large data sets are drawn as point clouds or node-link tion of points in 2-dimensional space through dimensionality reduc- graphs, it is sometimes possible to observe a visual similarity to ge- tion techniques. However, these traditional methods often do not ographic maps. For example, several small connected components capture well the underlying structural information, clustering, and next to a much larger component suggest several small islands next neighborhoods. In this paper, we describe GMap, a practical algo- to a big continent. We took the next step in visualizing data directly rithmic framework for visualizing relational data with geographic- as a geographic map. A map representation is familiar and intuitive; like maps. We illustrate the effectiveness of this approach with ex- most people are very familiar with maps and nicely drawn maps of- amples from several domains. ten provide hours of enjoyable exploration. Many people take ad- vantage of this familiarity with maps and manually create map-like Keywords: Information Visualization; Clustering; Graph Draw- representations that portray relations among abstract concepts. We ing; Graph Coloring; Maps; Set Visualization. were interested in automating the process which begins with the data and ends with a drawing of a map. In this process there are INTRODUCTION two main competing goals: First, we would like to be faithful to the With the growth of the Internet and scientific and technological ad- data, by maintaining the inherent structure and relationships, with- vances, the world has seen an explosion in the generation and col- out adding or implying non-existent structure. Second, we would lection of data. This data is often relational, high-dimensional, or like the final result to look like a map, using standard cartographic both. With such a wealth of data comes the need to understand it, conventions. Our attempt to address these two goals led to a frame- analyze it, and use it. Visualization can play a key role in all of work for a map-based data representation that we call GMap. these steps, but especially so in the exploratory phase. Many useful techniques for visualizing abstract data sets have GMAP been considered in the past. Graphs made of nodes and links (or The GMap framework allows us to generate map-like represen- edges) are often used to capture the relationships between objects, tations from an abstract data set. Specifically, given a high- and graph drawing allows us to visualize such relationships. Typ- dimensional data set or a graph with edge weights (e.g., the sim- ically nodes are represented by points in two or three dimensional ilarity between books as determined by purchase behavior at Ama- space, and edges are represented by lines between the correspond- zon.com), it produces a drawing with a map-like look, with coun- ing vertices. For an example of a relational data set, consider the tries that enclose similar objects, outer boundaries that follow the Amazon.com graph, where books are nodes and there is an edge be- outline of the vertex set, and inner boundaries that have the twists tween two books if people who have bought one of the books also and turns found in real maps. A typical example is in Figure 1 buy the other. and shows just under 1000 books.1 Our maps also can have lakes, High-dimensional data sets, on the other hand, are often visual- islands, and peninsulas, similar to those found in real geographic ized as point clouds in two or three dimensional space. An example maps. is the listening pattern of users of the last.fm website. Each user can be represented by a long vector. The dimension of the vector is GMap is a framework in the true sense of the word, rather than a the same as the number of musicians available at the website, and specific algorithm. It consists of four main steps, the first two steps the value of each element is proportional to the number of times a can be achieved by a variety existing algorithms. For the last two user listens to the corresponding musician. In a point cloud visual- steps we propose new algorithms. ization, each point is a user and two points are close to each other In the first step we take as input a graph or high-dimensional if the two users have similar music taste. data set, and embed it into the plane. The statistics and scien- Unfortunately, these standard approaches of node-link diagrams tific modeling communities have extensively explored this prob- and point cloud representations often require considerable effort to lem and provide many ways of doing this. Possible embedding comprehend. If we desire to make the data accessible to people algorithms include principal component analysis, multidimensional outside the areas of computer science and statistics, providing more scaling (MDS), force-directed algorithms, or non-linear dimension- compelling drawings is an important task. This is certainly true for ality reductions such as Locally Linear Embedding and Isomap. the ordinary user who might be curious about the recommendations The second step takes this collection of points in the plane and made by Amazon.com, but is also true for the biologist who wishes aggregates them into clusters. Here, it is important to match the clustering algorithm to the embedding algorithm. For example, a ∗e-mail: [email protected] geometric clustering algorithm such as k-means may be suitable for †e-mail: [email protected] an embedding derived from MDS, as the latter tends to place similar ‡e-mail: [email protected]. Work done while this author was at points in the same geometric region with good separation between AT&T Labs. clusters. On the other hand, with an embedding derived from a 1Most of the images in this article are available in an interac- tive form at http://www.research.att.com/˜yifanhu/MAPS/ imap.html. 1 8orni'ied" )ow 8orno ra0hy 5! 2a+a$ettin in -ur -''" 8orno ra0hy and &ive!: -ur Relation!hi0!: and -ur Fa+ilie! Can%t Buy /y &ove" )ow #dverti!in Chan e! The (nd o' #+erica" &etter o' the Way We Thin, and Feel Warnin to a 9oun 8atriot the (nd o' /a!culinity /adne!! and Civili6ation" # )i!tory o' The Body 8roBect" #n 5nti+ate 5n!anity in the # e o' Rea!on )i!tory o' #+erican $irl! Fe+ini!+ 5! 'or (very.ody" 5+a ined Co++unitie!" Re'lection! on the -ri in -rientali!+ 8a!!ionate 8olitic! and S0read o' Nationali!+: New (dition Fe+ini!tanThe 5nter0retation /e+orie!: Drea+!:Fe+ale Chauvini!t 8io' 2rea+! !" Wo+en and the Fleeced" )ow Barac, -.a+a: /edia /oc,ery o' Terrori!t Threat!: &i.eral! Who Want to 1ill Tal, Radio: the 2o-Nothin Con re!!: Co+0anie! That )el0 5ran: and Wa!hin ton &o..yi!t! 'or Forei n $overn+ent! #re Sca++in ?! ;;; and What to 2o #.out 5t Cli+ate Con'u!ion" )ow $lo.al War+in )y!teria &ead! to Bad Science: 8anderin 8olitician! and /i! uided 8olicie! that )urt the 8oor 8owerA1nowled e" Selected 5nterview! anRi!e o' Raunch Cultud re (cono+ic Fact!Ba!ic (cono+ic! Erd (d" # Co++on Stalin and )i! )an +en" The Tyrant Re'lection!Fanta!ie! o' The -rder o' Thin !" #n #rchaeolo y -ther Writin !: 19C2-19CC Sen!e $uide to the (cono+y (ver Wonder Why= #nd The Ca!e # ain!t Barac, -.a+a" The ?nli,ely Ri!e and Tho!e Who 1illed 'or )i+ Fli ht o' the )u+an Science! -ther Controver!ial (The -.a+a Nation" &e!!ay! 'ti!t 8olitic! andand ?ne*a+ined # enda o' the /edia%! Favorite Candidate Full Frontal Fe+ini!+" # 9oun Wo+an%! 8ote+,in" Catherine the $reat%! $uide to Why Fe+ini!+ /atter! The Beauty /yth" )ow 5+a e! o' The Future o' and Fallacie! Real Chan e" Fro+ the World That Face! in a Cloud" Beyond the 8lea!ure Beauty #re ?!ed # ain!t Wo+en The #nti-Federali!t 8a0er! and the an 5llu!ion Fail! to the World That Wor,! 5nter!u.Bectivity in 8er!onality Theory 8rinci0le Con!titutional Convention 2e.ate! the Cult o' 8er!onality $odle!!" The Church #+erican 8ro re!!ivi!+" The Whi!0erer!" 8riv5+0erial 8artnerate &i'e o' &i.erali!+ The ( o and the 2i!ci0line 3 8uni!h" The Bac,la!h" The ?ndeclared War # Reader in Stalin%! Ru!!ia Tru+an Culture Warrior Birth o' the 8ri!on Fro+ /a* We.er" (!!ay! # ain!t #+erican Wo+en The Federali!t 1hru!hchev%! Cold War" The 5n!ide Story in Sociolo y 8a0er! o' an #+erican #dver!ary &enin: Stalin: and )itler" The # e Sa!hen,a" # New 5ntroductory &ecture! on # Slo..erin &ove #''air" Novel The 2ivi!ion o' &a.or o' Social Cata!tro0he # Bold Fre!h 8iece Ca0itali!+" The 5d 8!ycho-#naly!i!" The Return o' 2e0re!!ion (cono+ic! and (cono+ic! in -ne &e!!on" The Shorte!t and The True o' )u+anity #nthe+ ?n,nown 5deal -ne )ell o' a $a+.le" 1hru!hchev: Ca!tro: and 1ennedy: 1958-19>4" the Cri!i! o' 2GG8 Sure!t Way to ?nder!tand Ba!ic (cono+ic! Su+erian /ytholo y" # Study o' S0iritual and &iterary The Secret )i!tory o' the Cu.an /i!!ile Cri!i! #chieve+ent in the Third /illenniu+ B;C; &i.eral Fa!ci!+" The Secret )i!tory o' the #+erican $od!: 2e+on! and Sy+.ol! o' #ncient Frin i!tan &e't: Fro+ /u!!olini to the 8olitic! o' /eanin 5' 2e+ocrat! )ad #ny Stalin" The Court o' /e!o0ota+ia" #n 5llu!trated 2ictionary -n War Co++on Sen!e: The Ri ht! o' /an and -ther 2o the Ri ht