Network Visualization: Gephi and

Caf'E.phe, février 2016 Pablo Ruiz Fabo — LATTICE [email protected] Network visualization • Requires relational data [ http://cvcedhlab.hypotheses.org/125 ]

. . . 2

. . . Network analysis • Some terminology: [ http://cvcedhlab.hypotheses.org/106 ]

. . .

3 Network analysis • Network: composed of nodes, linked by edges • Nodes represent actors in our domain – People, characters, concepts, places, … • Edges encode the relation between the nodes – Interacting with someone, citing someone’s work, occurring in the same paragraph, … • Edges can be weighted: encodes importance of the link – E.g. How many times did this link occur? • Edges can bear direction or not: – [Being a correspondent] vs. [being the sender vs. being the addressee of a letter] 4

Objectives • Create an co-occurrence network visualization with Gephi and Cytoscape, for two corpora: – History corpus on the American crisis of 2008 • A CSV file representing the network’s edges was used – Philosophy corpus: Jeremy Bentham’s manuscripts. • For Gephi, a GEXF file representing the network was used • For Cytoscape , a Graphml file representing the network was used (it can also be used for Gephi) • Export a navigable network so that it can be visualized outside these tools [ Some example files to import or create networks with, and example exported networks are available at apps.lattice.cnrs.fr/nav/cafephe11 ]

5 2008 Crisis Corpus: PoliInformatics

Smith et al. (2014) [12] 6 Bentham Corpus Jeremy Bentham: Philosopher, social reformer (1748-1832, London) Transcribe Bentham (Causer & Terras, 2014) [13] • UCL (London) • Unpublished manuscripts transcribed by volunteers (crowdsourcing)

Image: blogs.ucl.ac.uk/transcribe-bentham/ • 30,000 pages 7

8 Gephi version • This presentation covers Gephi 0.9, which came out in December 2015, and which works with Java 8 or 7 • Most training materials on Gephi are about version 0.8.2 (worked with Java 7, NOT 8) • Small UI changes between 0.8.2 and 0.9

Cytoscape version

• Cytoscape 3.3.0, works with Java 8, NOT 7

9 10 Import Edges table (1) • Start Gephi and go to Data Laboratory. You may need to close the Projects popup. Do File / New Project

• Click on Import Spreadsheet and search in the materials for a file whose name ends with “edges.csv”. Import it as an Edges table

11 Import Edges table (2)

1. Import Edge Table Weight and Create missing nodes must be checked in the dialogue

2. Once the table is imported, create labels by copying ID with the “Copy data to another column” tab in the bottom row

2a 2b

12 Initial Network • Click on the Overview tab to see the initial, not spatialized network:

13 Saving and exporting a project • It is advisable to both save and export a project

To save a project, just click on Save, as would be expected. It will be saved as a project file with the .gephi extension (it’s a sort of zip file)

Additionally, also export the network as a graph file for safety

14 Network Layout (1) • Run the Force Atlas layout, with these settings: 1. Choose the Layout 2. Specify Settings

Determines how far apart nodes will be, thus affecting the readability of the network (how wide it will spread)

Helps avoid label overlap (but there are other means for this too)

In force-based layouts (like Forced Atlas or Forced Atlas 2), linked nodes attract each other and unrelated nodes are represented as further apart. See [3] and [8].

15 Network Layout (2) • Once the network stabilizes, you can stop Force Atlas. • The initial layout will look similar to below • The zoom slider can be used to see more or less of the network

Toggle bottom pane here

Zoom

16 Node and Edge Appearance • In Gephi 0.9, unlike in 0.8.2, there are two modes for node and edge appearance, Unique and Attribute-based

17 Node and Edge Appearance • In Gephi 0.9, unlike in 0.8.2, there are two modes for node and edge appearance, Unique and Attribute-based • Colour • Size • Label colour • Label size

Attributes correspond to properties of nodes and edges, reflecting their role in the network as per different metrics 18 Node Size • Different types of metrics can be encoded in the node size. Here, we use a node’s Degree (how many nodes it is connected to) In the Appearance tab, choose After applying the ranking, node size the Nodes and Attribute buttons: will reflect the ranking criterion. In and then: this case, more strongly connected - Degree in the dropdown menu nodes will be bigger - The CIRCLES icon for node size in the button bar, hit Apply

For information on other ranking criteria, see [4] 19 Node Labels (1) • Other Node Label settings can be accessed from the bottom panel, that can be toggled here

• If at any point node labels overlap, this can be fixed by running the Label Adjust layout

20 Node Labels (2) • Label Sizes are defined with the leftmost button

- In scaled mode, all labels bear the - In node size mode, label size same size, scaled for readability matches node size - In fixed mode, all labels bear the size - Run Label Adjust Layout in case of specified in the font dropdown label overlap (Dialog bold 32 in the example)

• Label Colour is defined with the rightmost button21 Community Detection: Modularity • The modularity tool can be run to detect communities, i.e. groups of nodes that are more strongly connected among them than they are to other groups of nodes [9]. 1. In the Statistics pane on the right, look for Modularity and hit Run 2. Go to the Partition tab on the left, select Modularity Class from the dropdown, and hit Apply. The colors can be changed by clicking and right-clicking inside the colored square, or with the Palette button

22 Community Detection: Modularity • The modularity tool can be run to detect communities, i.e. groups of nodes that are more strongly connected among them than they are to other groups of nodes [9]. 1. In the Statistics pane on the right, look for Modularity and hit Run 2. Go to the Partition tab on the left, select Modularity Class from the dropdown, and hit Apply. The colors can be changed by clicking and right-clicking inside the colored square, or with the Palette button

23 Preview Pane Preview after applying a node size criterion and community detection Settings are default, unless specified on the screenshot.

Show Labels was activated

Edge Thickness was reduced to 0.2 to avoid too thick edges on highly connected nodes 24 Hit Refresh after any changes to the Settings or to reset an unreadable preview pane Filters • The network can be filtered according to many criteria (see [6]). Here, we filter nodes that have less than six connections, to get rid of generally less relevant nodes and edges

Expand the Topology dropdown - Double click on Degree Range - Move the slider at the bottom up up to the desired minimum degree

25 Exporting visualization as PDF or image • In the Preview pane, there’s a button to export the visualization (bottom left)

26 Export visualization as an interactive website: sigma.js exporter (1) • Gephi has several plugins that allow exporting the network in an interactive website format. – The website allows zooming in and out – In some cases, the user can selectively focus a part of the network and run searches for nodes • We’ll be using the sigma.js exporter plugin [10], which has all of the functions above. Depending on your browser, it may need to be run inside a web server (Apache, XAMPP, Wamp, EasyPHP etc.) • Other plugins allowing some of the above functions: – Seadragon plugin – Google Maps Exporter 27 Network as a website (2): sigma.js

• We need to do three things: – Install the sigma.js exporter plugin – Export the network as a sigma.js site – Make the site available from a web server • To install the plugin: – Go to Tools/Plugins, and select Sigma Exporter in the Available Plugins tab (once installed, it will move to the Installed tab)

28 Network as website (3): Exporting 1. Export the network from • Jafkaj File/Export and Sigma.js template

2. Fill in the dialogue: Give the path to folder to export the site to, and the legend to be displayed for the site’s data

29 Network as website (4): Web Server • We need to take the exported site from the previous step and put it in a web server. Note: some browsers (e.g. Firefox) allow seeing the networks just by opening the index.html file, no need for the local web server • If you don’t have a web server installed, a possibility is to install XAMPP https://www.apachefriends.org – Windows: • https://blog.udemy.com/xampp-tutorial/ • https://www.apachefriends.org/faq_windows.html – : https://www.apachefriends.org/faq_linux.html – Mac: https://www.apachefriends.org/faq_osx.html • Once you have the web server, to see the network, point a browser to http://localhost/XXX , where XXX corresponds to the name of your sigma.js network (by default the name is network when Gephi exports it). 30 Network as website (5): Config If edges on the exported network are too thin and node labels are not visible Look for config.json inside the folder where the sigma.js site was exported (network by default)

- Increase minEdgeSize and maxEdgeSize for thicker edges - Decrease labelThreshold to see more labels

31 32 Import the network or edges file The example involves the network for the Bentham corpus. Other graphml networks are available in the materials and can be manipulated similarly. An edges file (CSV) can also be imported the same way (but click ‘advanced’).

33 Layout The AllegroLayout plugin was used (Force-based), install it with Apps / Apps Manager Default options were chosen: “Spring-electric” option. If need to modify the layout, read about their intended effect with the tooltips

34 Apps / App Manager

35 Layout (another example) If you need a clearer layout, the Scale option will spread the network. If the edges have a weight attribute, it can be used from the Edge Weighting tab The following example follows a graphml import of the American crisis corpus, and the scale was modified. (The screenshot also reflects later modifications to the network appearance, see the following slides).

36 Node attributes from node table • We imported a ready network in Graphml; we can read the attributes off the import:

37 Attributes Similar to Gephi’s Unique / Attribute buttons: First column (Def.) defines a unique value Second column (Map.) defines values based on an attribute Final column (Byp.) allows to define exceptions In the example: - Fill color (default is a blue hue) reflects communities (based on column cluster_universal_index in the node table) - Size (default 35) is based on the size column of the imported nodes 38 Attribute value “mapping” • Discrete: a discrete set of categories • Continuous: continues values, the minimum and maximum can be set. • Passthrough: values read off the import file directly

39 Node color for communities Node color according to the community id of the node in the imported node table (in this case the id was called cluster_universal_index, but other names may appear) Note: the original network was created with Cortext Manager (manager.cortext.net), and communities were created with the Louvain method [9]

40 Other appearance options (1) After adding node color for communities, the node label was read off the label field of the nodes in the imported graphml network (otherwise the label would be the node’s numeric id). Label Font Size was set to 90

41 Other appearance options (2) Using the character co-occurrence example in Les Misérables provided in [2b] Node size was made dependent on the “size” attribute of the nodes in the graphml. The background color was changed from the Network tab (at the bottom of the control panel). Edge color was changed with the Edge tab.

42 Importing edge table and analysis • If we are importing just the edges (Source,Target,Weight), we won’t have all the attributes like node size, communities etc. So the first thing after import is running an analysis:

43 Analyzing the network: partitioning • Several possibilities:

44 Filtering the network

From the Select tab in the control panel (on top) Enter a selection criterion and create a new network with the result. Selected nodes are highlighted in yellow.

45 Visualizing the analyzed network After running the analysis, a partitioning with the Community cluster (GLay) app was peformed. Node color is based on that (__glayCluster attribute). Node size was made dependent on node degree (i.e. how many connections it has)

46 Exporting the network • Like in Gephi, the network can be exported as an image, as a graph file, or as a website.

47 Other The grid view helps look at different regions of the network (or selected vs othe rnodes ) at once

48 Interpretation problems • Hubs vs. Authorities: – nlp.stanford.edu/IR-book/html/htmledition/hubs- and-authorities-1.html • Force Atlas Layout and Force Atlas with Attraction Distribution: – The latter pushes hubs to the periphery, giving a different view of the same network, see [11] • Hubs vs. “Sinks” (e.g. air traffic)

49

Hubs vs. Authorities (1)

(2010) (2010) [11]

Rieder B.

50

Hubs vs. Authorities (2)

(2010) (2010) [11]

Rieder B.

51 Hubs vs. Authorities (3) • Force Atlas and Force Atlas with Attraction Distribution: – The latter pushes hubs to the periphery, giving a different view of the same network, see [11] – Look at Barry Wellman in the preceding graphs

52 Hubs vs. “Sinks” (1)

53 Hubs vs. “Sinks” (2) • Las Vegas is not a central element in the network. People fly to Las Vegas and back to their departure city, not through Las Vegas.

54 References: Gephi Tutorials The format of the reference list is: Description: URL [description of the dataset if applicable]

[1] General Tutorial: https://gephi.github.io/users/quick-start/ [Character cooccurrences in Hugo’s Les Misérables] [2a] Deeper: By Martin Grandjean http://www.martingrandjean.ch/gephi-introduction/ [many datasets] [2b] Deeper: By Clément Levallois http://www.clementlevallois.net/gephi.html [several datasets] [3] Importing edge tables from CSV: http://www.literaturegeek.com/2013/09/09/dataintogephi/ [Character interactions in Joyce’s Ulysses] [4] Network Layouts: https://gephi.github.io/users/tutorial-layouts/ [Les Misérables, Airlines dataset, Internet Core Routers datasets] [5] Metrics: http://www.clementlevallois.net/gephi/tuto/en/gephi_advanced%20functions_en.pdf [6] Formatting the Networks: https://gephi.github.io/users/tutorial-visualization/ [Airlines dataset] [7] Filters: http://blog.ouseful.info/2010/04/23/getting-started-with-gephi-network- visualisation-app-%E2%80%93-my-facebook-network-part-ii-basic-filters/ [Facebook]55

References: Cytoscape

56 References: Other [7] Bastian M., Heymann S., Jacomy M. (2009). Gephi: an open source software for exploring and manipulating networks. International AAAI Conference on Weblogs and Social Media. http://gephi.org/publications/gephi-bastian-feb09.pdf [8] Jacomy, M., Venturini, T., Heymann, S., & Bastian, M. (2014). Forceatlas2, a continuous graph layout algorithm for handy network visualization designed for the gephi software. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0098679 [9] Blondel, Vincent D and Guillaume, Jean-Loup and Lambiotte, Renaud and Lefebvre, Etienne. 2008. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment. http://arxiv.org/pdf/0803.0476.pdf [10] Sigma JS exporter, created by the Oxford Internet Institute: http://blogs.oii.ox.ac.uk/vis/ [11] Rieder, B. (2010). One network and four algorithms http://thepoliticsofsystems.net/2010/10/one-network-and-four-algorithms/ [12] Smith, N.A., Cardie. C., Washington, A. L., Wilkerson, J.D. (2014). Overview of the 2014 NLP Unshared Task in PoliInformatics. Proceedings of the ACL Workshop on Language Technologies and Computational Social Science, 5-7. [13] Tim Causer and Melissa Terras (2014). Crowdsourcing Bentham: Beyond the traditional boundaries of academic history. International Journal of Humanities and Arts Computing, vol. 8(1), pp. 46-64. 57

Thank you!

58