<<

Visualisation and Analysis of the Internet Movie Database∗

Adel Ahmed† Vladimir Batagelj‡ Xiaoyan Fu§ School of IT, University of Sydney Discrete and Computational Mathematics NICTA, Australia NICTA, Australia University of Ljubljana, Slovenia Seok-Hee Hong¶ Damian Merrick Andrej Mrvar∗∗ School of IT, University of Sydney School of IT, University of Sydney Social Science Informatics NICTA, Australia NICTA, Australia University of Ljubljana, Slovenia

ABSTRACT Understanding these networks is a key enabler for many appli- In this paper, we present a case study for the visualisation and anal- cations. Good analysis methods are needed for these networks, and ysis of large and complex temporal multivariate networks derived some are available. However, such methods are not useful unless from the Internet Movie DataBase (IMDB). Our approach is to in- the results are effectively communicated to humans. Visualisation tegrate network analysis methods with visualisation in order to ad- can be an effective tool for the understanding of such networks. dress scalability and complexity issues. In particular, we defined Good visualisation reveals the hidden structure of the networks and new analysis methods such as (p,q)-core and 4-ring to identify im- amplifies human understanding, thus leading to new insights, new portant dense subgraphs and short cycles from the huge bipartite findings and possible predictions for the future. graphs. We applied island analysis for a specific time slice in order We can identify the following challenging research issues for to identify important and meaningful subgraphs. Further, a tem- analysis and visualisation of large and complex networks: poral Kevin Bacon graph and a temporal two mode network are • Scalability: Webgraphs or telephone call graphs gathered by extracted in order to provide insight and knowledge on the evolu- AT&T have billions of nodes. In some cases, it is impossible tion. to visualise the whole graph, or one cannot possibly load the Keywords: Large and Complex Networks, Case Study, Visualisa- whole graph in a main memory. Hence, the design of new tion, Network Analysis, IMDB. analysis and visualisation methods for huge networks is a key research challenge from databases to computer graphics. Index Terms: H.5.2 [Information Interfaces and Presentation]: User Interfaces—Algorithms; I.3.6 [Computer Graphics]: Method- • Complexity: Relationships between actors in a social net- ology and Techniques— work, for example, can have a multitude of attributes (for ex- ample, observed behavior can be confirmed or unconfirmed, 1INTRODUCTION relationships can be directed or undirected, and weighted by Recent technological advances have led to the production of a lot of probabilities). Also, biological networks are quite complex data, and consequently have led to many large and complex network in nature; for example, metabolic pathways have only a few models across a number of domains. Examples include: thousand nodes, but their relationships and interactions are • very complex. The data may be given by nature, but some Webgraphs: where the entities are web pages and relation- parts of the data may be unknown to human scientists. The ships are hyperlinks; these are huge: the whole graph consists design of analysis and visualisation methods to resolve these of billions of nodes. complexity issues is the second research challenge. • Social networks: These include telephone call graphs (used • Network Dynamics: Real world networks are always chang- to trace terrorists), money movement networks (used to de- ing over time. Many social networks, such as webgraphs, tect money laundering), and citation networks or collabora- evolve relatively slowly over time. In some cases, such as tele- tion networks. The size of the network can be medium to very phone call networks, the data is a very fast-streamed graph. large. Effective and efficient modeling, analysis and visualisation • Biological networks: Protein-protein interaction (PPI) net- for dynamic networks are challenging research topics. works, metabolic pathways, gene regulatory networks and One approach to solve these challenging issues is an integra- phylogenetic networks are used by biologists to analyse and tion of analysis with visualisation and interaction. Analysis tools engineer biochemical materials. In general, they are smaller, for networks are not useful without visualisation, and visualisation with thousands of nodes. However, the relationships in these tools are not useful unless they are linked to analysis. Further, in- networks are very complex. teraction is necessary to find out more details or insights from the ∗This paper is based on the winning entry of the Graph Drawing Com- visualisation. petition 2005 [7] and invited presentation at Sunbelt Viszard Session [9]. In this paper, we present a case study for our approach to inte- †e-mail: [email protected] grating analysis, visualisation and interaction using large and com- ‡e-mail:[email protected] plex temporal multivariate networks derived from the IMDB (Inter- §e-mail:[email protected] net Movie Data Base). In general, the IMDB is a huge and very ¶e-mail:[email protected] rich data set with many attributes. Note that the IMDB data set has e-mail:[email protected] become a challenging data set for visualisation researchers [7, 9]. ∗∗e-mail:[email protected] For example, a multi-scale approach for visualisation of small world networks was used for data sets from IMDB [3]. A visual- Asia-Pacific Symposium on Visualisation 2007 ization approach for dynamic affiliation networks in which events 5 - 7 February, Sydney, NSW, Australia are characterized by a set of descriptors was presented [6]. A ra- 1-4244-0809-1/07/$20.00 © 2007 IEEE dial ripple metaphor was devised to display the passing of time and

17 ’EnquŒtes du commissaire Maigret, Les’ Popular Science Unusual Occupations

Table 1: (p,q : n1,n2) for IMDB Richard, Jean (I) Whitman, Gayne Carpenter, Ken (I)

Hutton, Timothy Heinrichs, Dirk 1 1590: 1590 1 | 22 24: 1854 1153 | 43 14: 29 83 ’Nero Wolfe Mystery, A’ Fox, Colin (I) Gawlich, Cathlen Dunn, Conrad ’Sitte, Die’ B hm, Iris 2 516: 788 3 | 23 23: 47 56 | 44 14: 29 83 Chaykin, Maury Boyd, Karin 3 212: 1705 18 | 24 23: 34 39 | 45 13: 30 95 Abatantuono, Diego Panczak, Hans Georg ’Commissario Corso, Il’ Maggio, Rosalia ’Operation Phoenix - J ger zwischen denMartens, Welten’ Dirk (I) 4 151: 4330 154 | 25 22: 42 53 | 46 13: 29 94 Jarczyk, Robert Pfohl, Lawrence Bock, Alana 5 131: 4282 209 | 26 22: 31 38 | 47 12: 29 101 Flair, Ric Borden, Steve (I) Eurovision Song Contest, The Kelehan, Noel 6 115: 3635 223 | 27 22: 31 38 | 48 12: 28 100 Berry, Colin Rasmussen, Tommy (I) Olsen, Jłrgen 7 101: 3224 244 | 28 20: 36 53 | 49 12: 26 95 Dansk melodi grand prix Statsministerens nyt rstale Schl ter, Poul Heick, Keld Rasmussen, Poul Nyrup de Mylius, Jłrgen 8 88: 2860 263 | 29 20: 35 52 | 50 11: 27 111 Siggaard, Kirsten Cream of Comedy Sims, Tim Hłeg, Jannie Leese, Lindsay 9 77: 3467 393 | 30 19: 35 59 | 51 11: 26 110

Kennedy Center Honors: A Celebration of the Performing Arts, The Dronningens nyt rstale 10 69: 3150 428 | 31 19: 35 59 | 52 11: 16 79 11 63: 2442 382 | 32 19: 34 57 | 53 10: 35 162

Cronkite, Walter Margrethe II 12 56: 2479 454 | 33 18: 34 62 | 54 10: 35 162 Levesque, Paul Michael Jacobs, Glen Gunn, Billy (II) 13 50: 3330 716 | 34 18: 34 62 | 55 10: 34 162 Hickenbottom, Michael Hart, Owen Hart, Bret Summerslam 14 46: 2460 596 | 35 18: 33 61 | 56 10: 34 162 Traylor, Raymond DiBiase, Ted Smith, Davey Boy 15 42: 2663 739 | 36 17: 33 65 | 57 9: 35 187 Anoai, Solofatu Lawler, Jerry Ross, Jim (III) McMahon, Vince 16 39: 2173 678 | 37 16: 33 75 | 58 9: 33 180 Eaton, Mark (II) Calaway, Mark 17 35: 2791 995 | 38 16: 30 73 | 59 9: 33 180 18 32: 2684 1080 | 39 16: 29 70 | 60 9: 32 178 19 30: 2395 1063 | 40 15: 29 77 | 61 9: 31 177 20 28: 2216 1087 | 41 15: 28 76 | 62 9: 31 177 Figure 1: Arcs with multiplicity at least 8 21 26: 1988 1087 | 42 15: 28 76 | 63 8: 31 202

conveys relations among the different constituents through appro- priate layout. Note that the method is suitable for an egocentric • 4-ring weights on lines perspective. ( , ) As the first step of our approach, we integrate network analysis 3.1 p q -core Analysis methods [5, 10] with visualisation. In particular, we defined the The subset of vertices C ⊆ V is a (p,q)-core in a bipartite (2-mode) new analysis methods such as (p,q)-core and 4-ring to identify im- network N =(V1,V2;L), V = V1 ∪V2 if and only if portant dense subgraphs and short cycles from the huge bipartite a =( , ( )) = ∩ graphs. We applied island analysis for a specific time slice in order . in the induced subnetwork K C1 C2;L C , C1 C V1, = ∩ ∀ ∈ ( ) ≥ ∀ ∈ to identify important and meaningful subgraphs of the large and C2 C V2 it holds v C1 :degK v p and v C2 : ( ) ≥ complex network. Further, a temporal Kevin Bacon graph and a degK v q ; temporal two mode network are extracted and visualised in order to b a provide insight and knowledge on the evolution of the IMDB data . C is the maximal subset of V satisfying condition . set. The basic properties of bipartite cores are: This paper is organised as follows. In the next Section, we present a simple analysis of the IMDB data set. In Section 3, we • C(0,0)=V present the integration of network analysis methods with visualisa- tion for large bipartite graphs including (p,q)-core, 4-ring and is- • K(p,q) is not always connected land. Section 4 presents visual analysis based on the Kevin-Bacon • ( ≤ ) ∧ ( ≤ ) ⇒ ( , ) ⊆ ( , ) number. Section 5 presents galaxy metaphor visualisation of a tem- p1 p2 q1 q2 C p1 q1 C p2 q2 poral two mode actor-movie network, and a visual analysis of the ( , ) two mode network with company attributes. Section 6 concludes. Using p q -cores, we can identify important dense structure out of large and complex networks. We design a very efficient O(m) algorithm to fine (p,q)-cores, and implement in Pajek . 2BASIC CHARACTERISTICS OF IMDB Since there are many (p,q)-cores, we must answer the question The source of the original data is the Internet Movie Database. of how to select the interesting ones among them. To help the user We transformed the contest data into a temporal network with in these decisions, we implemented a Table of cores’ characteristics some additional vectors and partitions describing the properties n1 = |C1(p,q)|, n2 = |C2(p,q)| and k – number of components in of vertices. The IMDB network is bipartite (two mode) and has K(p,q) (see Table 1 and 2). We look for (p,q)-cores where 1324748 = 428440 + 896308 vertices and 3792390 arcs. 9927 of the arcs in the network are multiple (parallel) arcs. The nature of • n1 + n2 ≤ selected threshold the appearance of multiple arcs can be seen in Figure 1, where all • ( − , ) ( , − ) ( , ) arcs with multiplicity of at least 8 are displayed. big jumps from C p 1 q and C p q 1 to C p q . Note that in the analysis that follows, we treat multiple arcs as For example, we selected (247,2)-core and (27,22)-core. From single. The IMDB network consists of 132714 weak components. the labels we can see that the corresponding topics are: wrestling, and pornography. See Figures 2 and 3. 3VISUALISATION AND ANALYSIS OF LARGE BIPARTITE NETWORKS 3.2 4-ring Analysis There are few direct specialized methods for analyzing bipartite A k-ring is a simple closed chain of length k. Using k-rings we can networks, especially large ones. Because of the size of the IMDB define a weight of edges as wk(e)=# of different k-rings containing network, the standard reduction of the entire network to one or the the e ∈ E. other derived 1-mode network was not an option. This motivated us Since for a complete graph Kr, r ≥ k ≥ 3wehavewk(Kr)= to design and implement two new methods for analysis of bipartite (r −2)!/(r −k)! the edges belonging to cliques have large weights. networks: Therefore, these weights can be used to identify the dense parts of a network. For example, all r-cliques of a network belong to r − 2- • bipartite version of cores – (p,q)-cores edge cut for the weight w3.

18 Zhukov, Boris (I) Wright, Charles (II) Wilson, Al (III) Wight, Paul Wickens, Brian White, Leon Warrior Warrington, Chaz Ware, David (II) Waltman, Sean ( , , ) Walker, P.J. von Erich, Kerry Table 2: p q : n1 n2 for IMDB Vaziri, Kazrow Van Dam, Rob Valentine, Greg Vailahi, Sione Tunney, Jack Traylor, Raymond Tenta, John Taylor, Terry (IV) Taylor, Scott (IX) Tanaka, Pat Tajiri, Yoshihiro Szopinski, Terry Size Freq Size Freq Size Freq Size Freq Storm, Lance Steiner, , Rick (I) ------Solis, Mercid Snow, Al Smith, Davey Boy Slaughter, Sgt. Simmons, Ron (I) 2 5512 20 19 38 4 59 2 Shinzaki, Kensuke Shamrock, Ken Senerca, Pete Scaggs, Charles 3 1978 21 18 39 3 61 1 Savage, Randy Saturn, Perry Sags, Jerry Ruth, Glen 4 1639 22 15 40 2 64 1 Runnels, Dustin Rude, Rick Rougeau, Raymond Rougeau Jr., Jacques 5 968 23 9 42 2 67 1 Rotunda, Mike Ross, Jim (III) Rock, The Roberts, Jake (II) Rivera, Juan (II) 6 666 24 13 43 3 70 1 Rhodes, Dusty (I) Reso, Jason Reiher, Jim Reed, Bruce (II) 7 394 25 12 45 3 73 1 Race, Harley Prichard, Tom Powers, Jim (IV) Poffo, Lanny 8 257 26 6 46 4 76 1 Plotcheck, Michael Piper, Roddy Pfohl, Lawrence Pettengill, Todd Peruzovic, Josip 9 209 27 6 47 5 82 1 Palumbo, Chuck (I) Page, Dallas Ottman, Fred Orton, Randy 10 148 28 5 48 1 86 1 Okerlund, Gene Nowinski, Chris Norris, Tony (I) Nord, John 11 118 29 6 49 2 106 1 Neidhart, Jim Nash, Kevin (I) Muraco, Don Morris, Jim (VII) 12 87 30 3 50 2 122 1 Morley, Sean Morgan, Matt (III) Mooney, Sean (I) Moody, William (I) Miller, Butch 13 55 31 6 51 1 135 1 Mero, Marc McMahon, Vince McMahon, Shane Survivor Series Matthews, Darren (II) 14 62 32 5 52 2 144 1 Martin, Andrew (II) Martel, Rick Marella, Robert Marella, Joseph A. 15 46 33 3 53 1 163 1 Manna, Michael Lothario, Jose Long, Teddy LoMonaco, Mark Lockwood, Michael 16 39 34 1 54 2 269 1 Levy, Scott (III) Levesque, Paul Michael Lesnar, Brock Leslie, Ed 17 27 35 5 55 1 301 1 Leinhardt, Rodney Layfield, John Lawler, , Brian (II) 18 28 36 4 57 1 332 2 Laurinaitis, Joe Laughlin, Tom (IV) Lauer, David (II) Knobs, Brian 19 29 37 7 58 1 673 1 Knight, Dennis (II) Killings, Ron Kelly, Kevin (VIII) ------Keirn, Steve Jones, Michael (XVI) Johnson, Ken (X) Jericho, Chris Jarrett, Jeff (I) Jannetty, Marty James, Brian (II) Jacobs, Glen Jackson, Tiger Hyson, Matt Hughes, Devon Huffman, Booker Howard, Robert William Howard, Jamie Houston, Sam Horowitz, Barry Horn, Bobby Hollie, Dan Hogan, Hulk Hickenbottom, Michael Heyman, Paul Hernandez, Ray Henry, Mark (I) Hennig, Curt Kesten, Brad Helms, Shane Hegstrand, Michael Heenan, Bobby Hebner, , Dave Heath, David (I) Brando, Kevin Hayes, Lord Alfred Hart, Stu Schoenberg, Jeremy Hart, , Jimmy (I) Hart, Bret Harris, Ron (IV) Harris, Don (VII) Harris, Brian (IX) Hardy, Matt Hauer, Brent Hardy, Jeff (I) Hall, Scott (I) Robbins, Peter (I) Guttierrez, Oscar Gunn, Billy (II) Guerrero, Jr., Chavo Shea, Christopher (I) Charlie Brown and Snoopy Show Gray, George (VI) Goldberg, Bill (I) Gill, Duane Gasparino, Peter Altieri, Ann Garea, Tony Funaki, Sho Reilly, Earl ’Rocky’ Fujiwara, Harry Frazier Jr., Nelson Foley, Mick Charlie Brown Celebration Flair, Ric Finkel, Howard Ornstein, Geoffrey Fifita, Uliuli You Don’t Look 40, Charlie Brown Fatu, Eddie Royal Rumble Farris, Roy He’s Your Dog, Charlie Brown Eudy, Sid Enos, Mike (I) Eaton, Mark (II) Making of ’A Charlie Brown Christmas’ Eadie, Bill Duggan, Jim (II) Douglas, Shane You’re In Love, Charlie Brown DiBiase, Ted DeMott, William Davis, Danny (III) Darsow, Barry It’s the Great Pumpkin, Charlie Brown Cornette, James E. Copeland, Adam (I) Constantino, Rico Connor, A.C. Charlie Brown’s All Stars! Cole, Michael (V) Life Is a Circus, Charlie Brown Coage, Allen Coachman, Jonathan Clemont, Pierre Clarke, Bryan Charlie Brown Christmas Chavis, Chris Centopani, Paul Cena, John (I) Race for Your Life, Charlie Brown Canterbury, Mark Candido, Chris Calaway, Mark Bundy, King Kong Buchanan, Barry (II) Brunzell, Jim Brisco, Gerald Be My Valentine, Charlie Brown Bresciano, Adolph Bloom, Wayne Mendelson, Karen Bloom, Matt (I) Blood, Richard Blanchard, Tully Blair, Brian (I) It’s Magic, Charlie Brown Blackman, Steve (I) Bischoff, Eric Dryer, Sally Bigelow, Scott ’Bam Bam’ Benoit, Chris (I) Batista, Dave Stratford, Tracy Bass, Ron (II) Melendez, Bill You’re a Good Sport, Charlie Brown Barnes, Roger (II) Backlund, Bob Austin, Steve (IV) Apollo, Phil Anoai, Solofatu It’s a Mystery, Charlie Brown Anoai, Sam Boy Named Charlie Brown Anoai, Rodney Anoai, Matt Anoai, Arthur Angle, Kurt It’s an Adventure, Charlie Brown AndrØ the Giant Anderson, Arn Albano, Lou Al-Kassi, Adnan Ahrndt, Jason Adams, Brian (VI) It’s Flashbeagle, Charlie Brown Young, Mae (I) Wright, Juanita Momberger, Hilary Wilson, Torrie Vachon, Angelle Play It Again, Charlie Brown Stratus, Trish Runnels, Terri Robin, Rockin’ Psaltis, Dawn Marie Moretti, Lisa Is This Goodbye, Charlie Brown? Moore, Jacqueline (VI) Moore, Carlene (II) Mero, Rena McMichael, Debra Charlie Brown Thanksgiving McMahon, Stephanie There’s No Time for Love, Charlie Brown Martin, Judy (II) Martel, Sherri Laurer, Joanie Keibler, Stacy Kai, Leilani You’re Not Elected, Charlie Brown Hulette, Elizabeth Guenard, Nidia Snoopy Come Home Garc a, LiliÆn Ellison, Lillian Dumas, Amy It’s the Easter Beagle, Charlie Brown

Shea, Stephen Figure 2: (247,2)-core

’WWF Smackdown!’ Taylor, Scott (IX) Figure 4: Charlie Brown Van Dam, Rob ’WWE Velocity’ Matthews, Darren (II) ’Sunday Night Heat’ LoMonaco, Mark ’Raw Is War’ Hughes, Devon WWF Huffman, Booker WWF Heyman, Paul Hebner, Earl To identify interesting substructures, we applied the simple is- WWF McMahon, Stephanie WWF Keibler, Stacy lands procedure for the weight w4. It takes around three minutes to WWF Wight, Paul compute w4 weights on a 1400 MHz, 1GB RAM computer, and 13 WWF Simmons, Ron (I) Senerca, Pete WWF Insurrextion seconds to determine the islands. We obtained 12465 simple line Ross, Jim (III) WWF Rock, The islands on 56086 vertices. Here is their size distribution. WWE Wrestlemania XX Reso, Jason WWE Wrestlemania X-8 McMahon, Vince There are 94 of size at least 30; and only 10 over 100. The WWE Vengeance McMahon, Shane Martin, Andrew (II) WWE Unforgiven largest island corresponds to wrestling. Each island represents a Levesque, Paul Michael WWE SmackDown! Vs. Raw Layfield, John special topic. We visualized only some of them. For example, see WWE No Way Out Lawler, Jerry WWE No Mercy Jericho, Chris Figures 4, 5, 6, 7 and 8. WWE Judgment Day Jacobs, Glen Hardy, Matt WWE Hardy, Jeff (I) 3.3 Time slices and Island Analysis Wrestlemania X-Seven Gunn, Billy (II) Wrestlemania X-8 Guerrero, Eddie Wrestlemania 2000 Copeland, Adam (I) By extracting a time slice from the complete network, we can iden- Cole, Michael (V) Survivor Series Calaway, Mark tify the main groups in selected time periods. Islands can identify Summerslam Bloom, Matt (I) Royal Rumble Benoit, Chris (I) important subgraphs of large networks based on the value of at- No Way Out Austin, Steve (IV) tributes [4]. King of the Ring Anoai, Solofatu Angle, Kurt Invasion Stratus, Trish To illustrate this, we extracted the time slice 1935-1950. There Dumas, Amy are 223 simple islands [4] for w4 on 1774 vertices. For example, we selected island 6 – ’Dona Macabra’; see Figure 9.

Figure 3: (27,22)-core 4TEMPORAL CO-STARRING NETWORK:KEVIN-BACON NETWORK We extracted a small important subset of the actors in the IMDB The 3-ring weights were already available [8]. However, there network and constructed from it a dynamic visualisation of a 1- are no 3-rings in the IMDB network. The densest substructures mode network showing the co-appearance of actors in films. are complete bipartite subgraphs Kp,q. They contain many 4-rings. To define a sufficiently small important subgraph, we first con- This motivated us to design a method to find 4-rings weights. We sidered only nodes in the network with a Kevin Bacon number of implement it in Pajek . 1. The Kevin Bacon number of an actor is a similar concept to the

19 Sergeant Madden Sawak nus el lail

Soltan, Hoda Honky Tonk Hoodlum Saint, The Malak el zalem, El Rostom, Hind Fatawa, El Roaring Twenties, The Unconquered El Dekn, Tewfik Sittat afarit, al- Union Pacific Phelps, Lee (I) Sarhan, Shukry Fatat el mina Port Said Hareb min el ayyam Riad, Hussein Abu Hadid Big City Flavin, James Elf laila wa laila Tarik el saada Souk el selah Saum, Cliff Wells Fargo Hub fil zalam Nashal, El Maktub alal guebin Star Is Born, A Dunn, Ralph Fatawat el Husseinia Hamama, Faten Amir el antikam You Can’t Take It with You San Quentin Abid el gassad Hamdi, Imad Ghaltet ab Vogan, Emmett Ard el ahlam Abu Dahab Chandler, Eddy Shawqi, Farid Aguazet seif El-Meliguy, Mahmoud Flowers, Bess Hamida O’Connor, Frank (I) Baad al wedah Batal lil nehaya Massiada, Al Namrud, El Whole Town’s Talking, The Asrar el naas Ebn el-hetta Abu Ahmad Nancy Drew... Reporter Sullivan, Charles (I) Baba Amin Nassab, El Beyt al Taa Zoj el azeb, El Dust Be My Destiny Meet John Doe Holmes, Stuart Haked, El Abid el mal Osta Hassan, El Cass el azab Castle on the Hudson Ibn al ajar Ghazal al-banat Valley of the Giants Ana bint min? Rasif rakam khamsa Murra kulshi, El Laab bil nar, El Racket Busters Kid Galahad Mohtal, El Iskanderija... lih? Zalamuni el habaieb Imlak, El Go Getter, The Ashki limin? Matloub zawja fawran They Made Me a Criminal Ana zanbi eh? Women in the Wind Mower, Jack Man Who Talked Too Much, The Naughty But Nice Yankee Doodle Dandy Kid From Kokomo, The King of the Underworld Figure 7: Shawqi, Farid and El-Meliguy, Mahmoud They Drive by Night Bad Men of Missouri Secret Service of the Air Adventures of Mark Twain, The Knockout Smashing the Money Ring

Figure 5: Mower, Jack and Phelps, Lee

Morgan, Jonathan (I) Polizeiruf 110 - Ein Bild von einem M rder Polizeiruf 110 - Kopf in der Schlinge Tr ume Polizeiruf 110 - Zerst rte Polizeiruf 110 - Angst um Tessa B low Polizeiruf 110 - Rosentod Polizeiruf 110 - Doktorspiele Polizeiruf 110 - Jugendwahn Liebe Polizeiruf 110 - Hei kalte Polizeiruf 110 - Todsicher Polizeiruf 110 - Der Spieler Polizeiruf 110 - Mordsfreunde Polizeiruf 110 - Kurschatten Polizeiruf 110 - Tote erben nicht Polizeiruf 110 - Der Pferdem rder Polizeiruf 110 - Henkersmahlzeit

Boy, T.T. Davis, Mark (V)

Voyeur, Vince

Dough, Jon Schwarz, Jaecki Winkler, Wolfgang

Sanders, Alex (I)

North, Peter (I)

Michaels, Sean

Horner, Mike Starkes Team - Der letzte Kampf, Ein Starkes Team - Eins zu Eins, Ein Ein Starkes Team - Kollege M rder, Starkes Team - Sicherheitsstufe 1, Ein Starkes Team - Das Bombenspiel, Ein Starkes Team - Blutsbande, Ein Rache, Ein Starkes Team - T dliche Starkes Team - Der Mann, den ich hasse, Ein Ein Starkes Team - Kindertr ume, Starkes Team - Auge um Auge, Ein Starkes Team - Lug und Trug, Ein Starkes Team, Ein Fische, Ein Starkes Team - Kleine Fische, gro e Starkes Team - Der Todfeind, Ein Starkes Team - Mordlust, Ein Schweigen, Ein Starkes Team - Das gro e Tod, Ein Starkes Team - Der sch ne Ein und L gen, Starkes Team - Tr ume Starkes Team - Bankraub, Ein Starkes Team - Verraten und verkauft, Ein Starkes Team - Braunauge, Ein Ein Starkes Team - Im Visier des M rders, Starkes Team - Die Natter, Ein

Drake, Steve (I)

Byron, Tom Silvera, Joey

West, Randy (I) Maranow, Maja Martens, Florian Lansink, Leonard Bademsoy, Tayfun Lerche, Arnfried Jeremy, Ron

Wallice, Marc Savage, Herschel

Thomas, Paul (I) ’Aff re Semmeling, Die’ ’Aff re Starkes Team - Erbarmungslos, Ein Wiedersehen, Ein Starkes Team - M rderisches Starkes Team - Roter Schnee, Ein Starkes Team - Der Verdacht, Ein

Figure 6: Adult Figure 8: Polizeiruf 110 and Starkes Team

Erdos¨ number of a mathematician; it represents the length of the movies in which the corresponding actor starred in that particular shortest path in the movie star collaboration network from the actor decade. Similarly, the width of an edge was used to represent the to Kevin Bacon. number of co-appearances between two actors in a decade. The data set was divided into time slices of a decade in length To effectively illustrate the evolution of the co-starring network, (e.g. 1920s, 1930s, etc.), and the set of actors reduced in each we display smooth animations between the layouts of subsequent decade to only those who had co-starred in at least 5 films with decades. The animations are broken into several parts shown one another actor with a Kevin Bacon number of 1. The sizes of the after the other in time, in order to aid retention of the mental map. graphs for each of these time slices are given in Table 3. First, nodes and edges not present in the first layout are faded out. The 1-mode co-starring networks of these reduced sets of actors Nodes present in both first and second layouts are then animated to were constructed for each decade, and a three-dimensional layout their new positions in the second layout. Nodes new to the second was generated for each using the Scale-free network layout [2]in layout burst out from the centre and come to rest in their calcu- GEOMI [1]. Nodes in the force-directed layout were restricted to lated positions, and finally new edges are faded in to show the new lie on one of three concentric spheres, depending on the degree of collaborations in the second decade. The animation is download- the node [2]. The colouring of each node was also used to indicate able from http://www.it.usyd.edu.au/∼dmerrick/gd05contest/gd05- the degree. The size of each node was dependant on the number of final.avi

20 Gonzalez, Gibran

Langlands, Rob

Fernandez, Emiliano Janitors, The Cardona, Renan

Arenas, Mathieu Misterio del latigo negro, El

Cabello, Antonio Tesoro de Morgan, El Del Degan, Davide Noriega, Leonardo J.

Calles, David Triboulet Blanco, Tomas (I) Tehtaan varjossa

Villate, Victor Trevino, Alejandro Lupo und der Muezzin Aroza, Diego

Gomez, Martha Primo Baby

Buendia, Jorge Villarreal, Juan Antonio Tierra y mar del noroeste Velasco, Gary Lopez, Bruno Suenos atomicos Soler, Cote Busquets, Enrique Tu Hau

Frank, Constanze Segarra, Carol Monja alferez, La Martinez, Pablo (V) D’Org, Olga Silencio roto de Anda, Rafael

Frauscher, Richard Rayo de luz, Un Obregon, Julia Escobar, Valeria Roldan, Celia Hoy canto para ti O’Farril, Alfredo Zea, Kristi Todo un caballero

Rueda, Enrique Martin Fierro Barreiro, Jose Parra, Aleksandr Perez, Jose A. (I) Sor Juana Inez de la cruz Camargos, Glaucia Marti, Adam

Lopez, Celso Isla Isabel Deray, Sara Dona Macabra Wimer, Homero

Calvo, Ricardo Madre padrona

Delholm, Kirsten

Morales, Lucy

Figure 9: Dona Macabra Figure 10: The co-starring actors visualisation (1960s)

KB1 V E Initial 1324748 3792390 all decades, no filtering 2742 336060 1910s, ≥ 5 films 16 18 1920s, ≥ 5 films 4 2 1930s, ≥ 5 films 25 53 1940s, ≥ 5 films 17 17 1950s, ≥ 5 films 19 18 1960s, ≥ 5 films 16 35 1970s, ≥ 5 films 79 411 1980s, ≥ 5 films 59 73 1990s, ≥ 5 films 207 425 2000s, ≥ 5 films 124 208

Table 3: Graph sizes per decade of co-starring network

This process was continued for all decade slices from 1911 through to 2004, and the result can be seen in the downloadable animation. Figures 10, 11, 12, 13, 14 show snapshots of the anima- tion from the 1960s through to the early 2000s. The visualisation revealed a number of interesting facts. One un- expected finding was the substantial number of actors with a Kevin Figure 11: The co-starring actors visualisation (1970s) Bacon number of 1 in the early years of the twentieth century, some of whom could clearly not have co-starred in a film with Kevin Ba- con. This revealed some problems in the collection of the movie The visualisation of the 1980s (Figure 12) highlights some par- data set. The years of some movies had been recorded incorrectly, ticularly close-knit groups of actors. Comedy stars Chevy Chase, while edges to other movies that possessed the same name as a Dan Akroyd and Bill Murray appear due to roles in Satuday Night movie of a prior decade were all recorded as belonging to the earlier Live, Caddy Shack and Spies Like Us. Also present are Jim Cum- movie. mings, Jack Angel and Rob Paulson, who have quite high degrees In the 1960s (Figure 10), the visualisation shows a clique involv- due to their involvement as voice actors in many short cartoons and ing the US president John F. Kennedy. This is due to the assassina- episodes. tion of Kennedy in 1963, and the subsequent barrage of documen- These groups continue into the 1990s, where the groups of actors taries that were produced detailing the event. The other actors in the become much larger and more highly connected (Figure 13). More clique (Jacqueline Kennedy, John and Nellie Connally, etc.) were well-established modern actors like Whoopi Goldberg, Tom Hanks all present at the assassination. They are present in this data set and Dennis Hopper become particularly prominent in this decade. since the movie JFK, starring Kevin Bacon, included real archive Finally, in the 2000s, we see some particularly interesting and footage of the assassination. The Kennedys continue through to unexpected phenomena (Figure 14). First, music stars such as Brit- later decades in the visualisation, illustrating the vast number of ney Spears, Beyonce´ Knowles and Sheryl Crow appear with very documentary films developed that were based on this event. high degree and connectedness, due to their participation in numer- The 1970s, shown in Figure 11, sees the first large connected ous music award shows. Secondly, on the other side of the visu- group of Hollywood actors that continue as big names to this day. alisation, popular actor Arnold Schwarzenegger links politicians to James Earl Jones, Robert Redford, Steve Martin and John Travolta the movie stars and musicians in the rest of the co-starring network. all appear in this group. This was primarily due to Schwarzenegger’s entry into politics, in

21 Figure 12: The co-starring actors visualisation (1980s) Figure 14: The co-starring actors visualisation (2000s)

reduce visual complexity as follows. We define the “stars” from the IMDB as follows: • every star actor must have been in more than 12 movies over the whole time period • every star movie must have more than 12 actors • each star actor must have played in between three to six movies in each year We again use a bipartite (2-mode) network model. There are two types of nodes: actor nodes and movie nodes. Actor nodes are dis- played as stars in the night sky, and edges are displayed as faint lines joining up “constellations” of actors (See Figure 15). Edges with bends are displayed between actor and movie nodes; however, movie nodes are hidden; in this manner, collaboration between ac- tors can easily be seen. In this case, the picture not only reduces the visual complexity (especially for edges), but also represents actor- movie and actor-actor interactions at the same time. To produce an overview of the temporal network dynam- ics, we computed a layout for each year from 1907 to 2004 Figure 13: The co-starring actors visualisation (1990s) and produced an animation. A two-dimensional force-directed layout was generated for each year’s subgraph using GEOMI [1]. The animation is performed between each layout, in a becoming the governor of the US state of California. Following similar manner to the animation of the co-starring authors net- this event, he was in several political documentaries in which Bill work in the previous section. The animation is available from Clinton also appeared. Bill Clinton, in turn, is linked through docu- http://www.it.usyd.edu.au/∼dmerrick/gd05contest/gd05-final.avi mentaries and archival footage to other famous politicians, such as Once we have an overview of the temporal network using an Ronald Reagan, Richard Nixon and John F. Kennedy. animation, we now focus on the details of the specific year of the network to observe some interesting patterns in specific time peri- 5AGALAXY OF MOVIE STARS OF TEMPORAL ACTOR- ods. MOVIE NETWORK Figure 16 shows part of the layout of year 1918. Those three This section describes a galaxy of movie stars of the temporal actor- actors co-starred in five movies together; on the other hand, they did movie network with animation (in order to see the overview), and not appear in any other movies. Only one of the movies includes a visualisation of the network of specific time slice (in order to see actors from outside. This kind of pattern can be usually found in the details). the early years. First we consider a “galaxy of stars” metaphor of the movie-actor Figures 17 and 18 show a different pattern. They are both cap- network. The main idea is to map the “movie stars” in a movie tured from the layout of year 1983. In Figure 17, nineteen actors (i.e. animation) of a galaxy of stars which displays actor-movie co-starred in a masterpiece. In Figure 18, the same group of peo- interactions. ple starred in a series of movies together, whilst also appearing in Representing as much information as possible without introduc- other movies with actors from outside the group. Compared to the ing overwhelming visual complexity has always been a challenge pattern of early years in Figure 16, one may gain some knowledge when visualising large data sets. We define important subgraphs to and insight about the trends of the movie industry from Figure 17.

22 Figure 17: Many actors co-starring one movie.

Figure 15: A frame from the galaxy of stars animation

Figure 18: Same group of people in several movie.

Figure 16: Actor collaboration pattern in early years. Ultimately, appropriate interaction methods need to be integrated in order to complete our visual analysis framework for large and Further insights can be discovered when combining company at- complex networks. tributes in visualisation, Figures 19 to 22 show. There are two clus- REFERENCES ters in 1985. To assist with analysis, we display the movie nodes [1] A. Ahmed, T. Dwyer, M. Forster, X. Fu, J. Ho, S. Hong, D. with their labels. The two clusters are normal movies and adult Koschutzki,¨ C. Murray, N. Nikolov, A. Tarassov, R. Taib and K. Xu, movies. GEOMI: GEometry for Maximum Insight, Proc. of Graph Drawing Figures 19 to 22 show some patterns in the evolution: before the 2006, pp. 468-479, 2006. 1990s, these two types of movies were clearly separated, meaning [2] A. Ahmed, T. Dwyer, S. Hong, C. Murray, L. Song and Y. Wu, Vi- that they were produced by different companies with different ac- sualisation and Analysis of Large and Complex Scale-free Networks, tors. That is, two groups seldom collaborated. However, these two Proc. of EuroVis 2005, pp. 18, 2005. groups started to merge into one big group. The actors started to [3] D. Auber, Y. Chiricota, F. Jourdan and G. Melanon, Multiscale Visu- move around between different companies for collaboration. For alization of Small World Networks, Proc. of InfoVis, pp. 75-81, 2003. example, see the year 1994. It is difficult to separate these two [4] V. Batagelj, Analysis of large networks - Islands, Dagstuhl seminar groups in the picture. This may be an indication of the possible 03361: Algorithmic Aspects of Large and Complex Networks, 2003. change in the movie industry, as well as to the social network of ac- [5] U. Brandes and T. Erlebach, Network Analysis: methodological foun- tors. This visualisation can be a useful supplement to formal anal- dations, Springer, 2005. ysis methods. [6] U. Brandes, M. Hoefer and C. Pich, Affiliation Dynamics with an Ap- plication to Movie-Actor Biographies, Proc. of EuroVis 2006, pp. 179- 6CONCLUSION 186, 2006. [7] Graph Drawing 2005 Competition, http://gd2005.org/ Integration of good analysis methods with proper visualisation [8] Pajek, http://vlado.fmf.uni-lj.si/pub/networks/pajek/ methods is an effective approach to gain an insight into large and [9] Sunbelt XXVI 2006 Viszard Sesseion. complex networks. Our next step is to further integrate various [10] S. Wasserman and K. Faust, Social Network Analysis: Methods and analysis methods with visualisation on different data sets. A for- Applications, Cambridge University Press, 1994. mal evaluation on the insights and knowledge derived then needs to be carried out.

23 Figure 19: Layout of 1985 Figure 21: Layout of 1991

Figure 20: Layout of 1988 Figure 22: Layout of 1994

24