Some Approaches to the Analysis and Visualization of the Internet Movie Database Vladimir Batagelj and Andrej Mrvar University of Ljubljana, Slovenia Adel Ahmed, Xiaoyan Fu, Seok-Hee Hong and Damian Merrick National ICT Australia, Sydney, Australia September 8, 2005 The source of the original data is the Internet Movie Database. We transformed the contest data into a Pajek temporal network with some additional vec- tors and partitions describing the properties of vertices. imdb.net - imdb network in Pajek format imdbL.net - imdb network with long names imdb.clu - type partition of vertices imdb.nam - long names imdb.vec - year large.net - largest weak component with long names large.vec - years for large largeT.clu - type partition for large largeB.clu - bipartition for large The file imdb.clu contains the following classes: 0 Actor 11 Crime 1 Drama 12 Sci-Fi 2 Short 13 Horror 3 Documentary 14 War 4 Comedy 15 Fantasy 5 Western 16 Romance 6 Family 17 Adventure 7 Mystery 18 Animation 8 Thriller 19 Action 9 Adult 20 Musical 10 Music 21 Film-Noir 99 Unknown The Pajek data files are available at Pajek ’s data sets page. 1 Basic characteristics of IMDB The IMDB network is bipartite (2-mode) and has 1324748 = 428440 + 896308 vertices and 3792390 arcs. 9927 of the arcs in the network are multiple (parallel) arcs. Here is their distribution. multiplicity frequency ------------------------ 1 3775126 2 6178 3 588 4 267 5 128 6 66 7 45 8 18 9 23 10 6 11 5 12 3 13 2 15 1 16 1 17 2 22 1 32 1 35 1 43 1 ------------------------ The nature of the appearance of multiple arcs can be seen from the Figure 1 where all arcs with multiplicity at least 8 are displayed. In the analyses that follow, we decided to treat multiple arcs as single. The IMDB network consists of 132714 weak components. Here is the distribution of their sizes. Size Freq Size Freq -------------------------------------- 1 124829 21 9 2 3557 22 3 3 1526 23 6 4 922 24 4 5 615 25 5 6 424 26 2 7 219 27 1 8 139 28 1 9 107 29 1 10 80 31 4 11 67 32 1 12 43 33 1 13 28 35 1 14 31 37 2 15 15 40 1 16 19 42 1 17 10 45 2 18 16 50 1 19 12 58 1 20 6 73 1 21 9 1169725 1 -------------------------------------- 2 'Enquêtes du commissaire Maigret, Les' Popular Science Unusual Occupations Richard, Jean (I) Whitman, Gayne Carpenter, Ken (I) Hutton, Timothy Heinrichs, Dirk 'Nero Wolfe Mystery, A' Fox, Colin (I) Gawlich, Cathlen Dunn, Conrad 'Sitte, Die' Bhm, Iris Chaykin, Maury Boyd, Karin Abatantuono, Diego Panczak, Hans Georg 'Commissario Corso, Il' Maggio, Rosalia ’Operation Phoenix - Jger zwischen denMartens, Welten’ Dirk (I) Jarczyk, Robert Pfohl, Lawrence Starrcade Bock, Alana Flair, Ric Borden, Steve (I) Eurovision Song Contest, The Kelehan, Noel Berry, Colin Rasmussen, Tommy (I) Olsen, Jørgen Dansk melodi grand prix Statsministerens nytrstale Schlter, Poul Heick, Keld Rasmussen, Poul Nyrup de Mylius, Jørgen Siggaard, Kirsten Cream of Comedy Sims, Tim Høeg, Jannie Leese, Lindsay Kennedy Center Honors: A Celebration of the Performing Arts, The Dronningens nytrstale Cronkite, Walter Margrethe II Levesque, Paul Michael Jacobs, Glen Gunn, Billy (II) Hickenbottom, Michael Hart, Owen Royal Rumble Hart, Bret Summerslam Traylor, Raymond DiBiase, Ted Smith, Davey Boy Anoai, Solofatu Survivor Series Lawler, Jerry Ross, Jim (III) McMahon, Vince King of the Ring Eaton, Mark (II) Calaway, Mark Figure 1: Arcs with multiplicity at least 8 Pajek 3 Identifying interesting parts of bipartite networks There are few direct specialized methods for analyzing bipartite (2-mode) networks, especially large ones. Also, because of the size of the IMDB network, the standard reduction of the entire network to one or the other derived 1-mode network was not an option. The only special method available in Pajek was the adapted version of hubs and authorities, which did not produce very interesting results. We started to think about some new methods. Last August we developed and implemented in Pajek two new methods for analysis of bipartite networks: • bipartite version of cores – (p, q)-cores • 4-rings weights on lines For details see Dagstuhl seminar 05361 / Batagelj. (p, q)-cores The subset of vertices C ⊆ V is a (p, q)-core in a bipartite (2-mode) network N = (V1,V2; L), V = V1 ∪ V2 iff a. in the induced subnetwork K = (C1,C2; L(C)), C1 = C ∩ V1, C2 = C ∩ V2 it holds ∀v ∈ C1 : degK (v) ≥ p and ∀v ∈ C2 : degK (v) ≥ q ; b. C is the maximal subset of V satisfying condition a. The basic properties of bipartite cores are: • C(0, 0) = V • K(p, q) is not always connected • (p1 ≤ p2) ∧ (q1 ≤ q2) ⇒ C(p1, q1) ⊆ C(p2, q2) There exists a very efficient O(m) algorithm to determine (p, q)-cores. Since there are many (p, q)-cores, we must answer the question of how to select the interest- ing ones among them. To help the user in these decisions, we implemented in Pajek a Table of cores’ characteristics n1 = |C1(p, q)|, n2 = |C2(p, q)| and k – number of components in K(p, q). We look for (p, q)-cores where • n1 + n2 ≤ selected threshold • big jumps from C(p − 1, q) and C(p, q − 1) to C(p, q). We selected (247,2)-core, (27,22)-core and (2,516)-core. From the labels we can see that the corresponding topics are wrestling and pornography. 4 Table 1: (p, q : n1, n2) for IMDB 1 1590: 1590 1 | 22 24: 1854 1153 | 43 14: 29 83 2 516: 788 3 | 23 23: 47 56 | 44 14: 29 83 3 212: 1705 18 | 24 23: 34 39 | 45 13: 30 95 4 151: 4330 154 | 25 22: 42 53 | 46 13: 29 94 5 131: 4282 209 | 26 22: 31 38 | 47 12: 29 101 6 115: 3635 223 | 27 22: 31 38 | 48 12: 28 100 7 101: 3224 244 | 28 20: 36 53 | 49 12: 26 95 8 88: 2860 263 | 29 20: 35 52 | 50 11: 27 111 9 77: 3467 393 | 30 19: 35 59 | 51 11: 26 110 10 69: 3150 428 | 31 19: 35 59 | 52 11: 16 79 11 63: 2442 382 | 32 19: 34 57 | 53 10: 35 162 12 56: 2479 454 | 33 18: 34 62 | 54 10: 35 162 13 50: 3330 716 | 34 18: 34 62 | 55 10: 34 162 14 46: 2460 596 | 35 18: 33 61 | 56 10: 34 162 15 42: 2663 739 | 36 17: 33 65 | 57 9: 35 187 16 39: 2173 678 | 37 16: 33 75 | 58 9: 33 180 17 35: 2791 995 | 38 16: 30 73 | 59 9: 33 180 18 32: 2684 1080 | 39 16: 29 70 | 60 9: 32 178 19 30: 2395 1063 | 40 15: 29 77 | 61 9: 31 177 20 28: 2216 1087 | 41 15: 28 76 | 62 9: 31 177 21 26: 1988 1087 | 42 15: 28 76 | 63 8: 31 202 5 Zhukov, Boris (I) Wright, Charles (II) Wilson, Al (III) Wight, Paul Wickens, Brian White, Leon Warrior Warrington, Chaz Ware, David (II) Waltman, Sean Walker, P.J. von Erich, Kerry Vaziri, Kazrow Van Dam, Rob Valentine, Greg Vailahi, Sione Tunney, Jack Traylor, Raymond Tenta, John Taylor, Terry (IV) Taylor, Scott (IX) Tanaka, Pat Tajiri, Yoshihiro Szopinski, Terry Storm, Lance Steiner, Scott Steiner, Rick (I) Solis, Mercid Snow, Al Smith, Davey Boy Slaughter, Sgt. Simmons, Ron (I) Shinzaki, Kensuke Shamrock, Ken Senerca, Pete Scaggs, Charles Savage, Randy Saturn, Perry Sags, Jerry Ruth, Glen Runnels, Dustin Rude, Rick Rougeau, Raymond Rougeau Jr., Jacques Rotunda, Mike Ross, Jim (III) Rock, The Roberts, Jake (II) Rivera, Juan (II) Rhodes, Dusty (I) Reso, Jason Reiher, Jim Reed, Bruce (II) Race, Harley Prichard, Tom Powers, Jim (IV) Poffo, Lanny Plotcheck, Michael Piper, Roddy Pfohl, Lawrence Pettengill, Todd Peruzovic, Josip Palumbo, Chuck (I) Page, Dallas Ottman, Fred Orton, Randy Okerlund, Gene Nowinski, Chris Norris, Tony (I) Nord, John Neidhart, Jim Nash, Kevin (I) Muraco, Don Morris, Jim (VII) Morley, Sean Morgan, Matt (III) Mooney, Sean (I) Moody, William (I) Miller, Butch Mero, Marc McMahon, Vince McMahon, Shane Survivor Series Matthews, Darren (II) Martin, Andrew (II) Martel, Rick Marella, Robert Marella, Joseph A. Manna, Michael Lothario, Jose Long, Teddy LoMonaco, Mark Lockwood, Michael Levy, Scott (III) Levesque, Paul Michael Lesnar, Brock Leslie, Ed Leinhardt, Rodney Layfield, John Lawler, Jerry Lawler, Brian (II) Laurinaitis, Joe Laughlin, Tom (IV) Lauer, David (II) Knobs, Brian Knight, Dennis (II) Killings, Ron Kelly, Kevin (VIII) Keirn, Steve Jones, Michael (XVI) Johnson, Ken (X) Jericho, Chris Jarrett, Jeff (I) Jannetty, Marty James, Brian (II) Jacobs, Glen Jackson, Tiger Hyson, Matt Hughes, Devon Huffman, Booker Howard, Robert William Howard, Jamie Houston, Sam Horowitz, Barry Horn, Bobby Hollie, Dan Hogan, Hulk Hickenbottom, Michael Heyman, Paul Hernandez, Ray Henry, Mark (I) Hennig, Curt Helms, Shane Hegstrand, Michael Heenan, Bobby Hebner, Earl Hebner, Dave Heath, David (I) Hayes, Lord Alfred Hart, Stu Hart, Owen Hart, Jimmy (I) Hart, Bret Harris, Ron (IV) Harris, Don (VII) Harris, Brian (IX) Hardy, Matt Hardy, Jeff (I) Hall, Scott (I) Guttierrez, Oscar Gunn, Billy (II) Guerrero, Eddie Guerrero Jr., Chavo Gray, George (VI) Goldberg, Bill (I) Gill, Duane Gasparino, Peter Garea, Tony Funaki, Sho Fujiwara, Harry Frazier Jr., Nelson Foley, Mick Flair, Ric Finkel, Howard Fifita, Uliuli Fatu, Eddie Royal Rumble Farris, Roy Eudy, Sid Enos, Mike (I) Eaton, Mark (II) Eadie, Bill Duggan, Jim (II) Douglas, Shane DiBiase, Ted DeMott, William Davis, Danny (III) Darsow, Barry Cornette, James E.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages23 Page
-
File Size-