Lab 6 Data Communication Using Gephi and Tableau January 19, 2015

Total Page:16

File Type:pdf, Size:1020Kb

Lab 6 Data Communication Using Gephi and Tableau January 19, 2015

Lab 6 Data Communication using Gephi and Tableau January 19, 2015

Overview of Lab6: We will work with graph data visualization with Gephi that is strongly founded in algorithmic research. Then we will familiarize ourselves with professional data analysis and visualization software Tableau using four exercises. You will observe that the learning curve for Tableau is indeed very flat.

Gephi is open source software for graph and network visualization. Gephi input data comprises of either in a tabular form, csv file or xml file. The Gephi has a strong algorithmic foundation and the details of the algorithms and research used for any graph processing step is readily available as a part of the tool. Gephi is good for finding clusters and communities.

1. Start Gephi and study the various parts of the Gephi development environment. At the top is the project and file management top line menu. Just below that are three important tabs: Overview, Data Laboratory and Preview.

2. Data Laboratory is for creating and importing edges and nodes data.

3. Overview tab is for configuring and working with the data.

4. Preview is visualizing the final product, the graph or network. From the Preview the graph can be exported into a form (pdf, png) that can be published.

5. On the left part of the screen shot shown above are the graph manipulation commands; the right are commands for computing statistics, filtering, modularity detection etc. 6. We explore the features of Gephi using two examples: (i) first one to get started with Gephi, with large number of nodes and (ii) a well-known data set of the cast of Les Miserables, we can visualize the importance/influence of the various characters.

Exercise 1: Goal of this project is to understand the basics of importing data, obtaining a graph and execute simple clustering of the data.

1. Open Gephi  New Project Data Laboratory click on nodes  import spreadsheet

Make sure “nodes table” is clicked in the dialog box that openschoose the nodes_dh11.csv

Repeat the same the same for the edges. Makes sure edges table is clicked in the dialog box opens. The details are shown below;

Import Nodes Data

Click on the Nodes tab here, then click on Import Spreadsheet

With Nodes table selected here, select the Nodes data file of .csv format by browsing from here

Once you have selected the file Preview appears like above

If the data looks correct then click -> Next, it will ask you if the data-types of imported columns are fine then click-> Finish

Import Edges Data Select Edges, and then click on Import Spreadsheet

Select option Edges table like here, browse and load the edges file from here and we see an edge table

preview in grid, like below

Preview the Imported Data

Now we will try to see how the data we imported looks like in a graph. Click on Overview tab to switch from data laboratory tab.

Below is how the above imported data looks like if we click Overview. That much looks like a hair ball. Click on context window in Upper Right Hand Bar.

 621 Nodes

 733 Edges

 Directed Graph

Force-Atlas Layout It makes the connected nodes attracted to each other and pushes the unconnected nodes apart to create a cluster of connections. Go to Layout section. Upper left side

Once you have clicked on the Layout pane. Click the dropdown and select Force-Atlas

 Set the “Repulsion strengh” at 1000 to strongly repel the disconnected nodes away from each other

 Set the “Attraction strength” at 10000 to strongly bring the connected nodes closer to each other All the connected nodes have condensed together and disconnected ones are far apart. The layout run will auto stablize. It may be different visual than the above. Stop the Layout run after that. We have segregating the clusters from each other. We will stop here and analyze a complete example of known data.

Exercise 2: In this exercise we will work with a known case of the data about the cast of Les Miserables. The data in this case is presented as an XML file with node and edge tags. Nodes in this case study represent the characters in the Les Miserables and the edges the relationship between them and the strength of the relationship. The file is in a Gephi XML format file (.gexf). 1. Gephi New Project Open file LesMiserables.gefx file, you will see the import report below.

2. OK and Click Overview. You will see a crowded graph with nodes and edges connected.

3. Now on the left panel, choose Force Atlas from the drop down box, set the repulsion to 10000 and click Run to run the analysis.

4. Stop the run after it auto stabilizes.

5. You see a network that is little bit clearer than the first one. We will process it further.

6. Locate the Ranking module on the left panel. Choose Degree as the rank and click Apply to see the results.

7. Move the mouse over the gradient component and double click on the triangle to configure color.

8. Click on the small table symbol at the left bottom of Ranking panel and click Apply to see the ranking of the various nodes. You can observe that Valjean has the highest degree of connectivity. 9. Click the triangle for size on the panel and click Apply. You will see the nodes are represented by their relative strengths.

10. Click on Adjust by Size in the bottom panel and Run it for some time to clean the graph further.

11. Click on T (Show Node Labels) at the bottom left of the display Window. Adjust the font and size etc. We will manipulate the size of the labels using the buttons at the bottom so that only important ones are visible.

12. Community detection: Now we will examine Statistics and metrics available on the right and select the modularity. Run.

13. In the Partition panel on the left, select Modularity class as the parameter, and click refresh and Apply. You will see the communities colored.

14. Using this community information decisions can be made and strategies designed.

15. We will filter nodes with low connectivity. Use the Filter feature for this. Select the Filter tab on the panel on the right, select Topology, degree range parameter; move the selection to the bottom Queries window, set it to 2. Watch the outlier nodes disappear.

16. Press Preview to view the graph. You can set various parameters to visualize the graph with different parameters such a curved edge and black background. The graph can be exported as SVG/PDF/PNG for use in your presentations and reports. The result is shown below. Tableau Exercises: We will learn the Tableau layout and then work on four exercises. For the various features of Tableau study the screen shot given below. We will create this worksheet to understand Tableau layout. Exercise 3:

1. Start Tableau  connect to dataStudy the various sources that Tableau supports. Click Excel Worksheet import Worldbankdata.xlsxStudy the various tables/data available. You can drag any data sheet into Tableau workspace. We will Drag Country data and Region Data and do an inner join on Country.

2. Drag the Data By Country into the workspace and Click the Go to Worksheet icon in the middle.

3. Examine the various regions of the Tableau interface for worksheet; Starting at the top left: (i) the list of data sources (ii) dimensions and measures available in the selected data source, (iii) columns and row shelves (header and axes) (iv) filters for visualization (v) Marks for controlling visualization using color, size, label, tooltip text, and shape, (vi) the canvas where visualization is displayed, (vii) sheets and dashboard tables at the bottom, (viii) session tab indicating the session: connect to data etc. and (ix) the Show me windows at the right panel showing all the possible chart types for choose from.

4. Creation of the worksheet shown above involves few clicks, drag and drops at appropriate tabs/buttons.

5. We want to plot GPD per capita for the countries and color and size them. Press Cntrl key + GDP per capita from Measures + Country Name from Dimensions; select Horizontal bars from Show me Panel.

6. Drag Region name to Color Mark button and see the chart get colorized. 7. See the 12 nulls at the bottom right corner, click on it and select Filter Data.

8. Drag Sum GDP into Label mark button so that GDP can be displayed along with the bars.

9. Drag Date into Filter panel and filter the visualization for the year 2010.

10. Present the plot using Presentation tab above the plot in the line below the top line menu.

11. See the interactions possible with the presentation chart.

12. Return to workspace by clicking on the presentation symbol at the bottom right corner.

13. Save the worksheet for future review. Exercise 4: Multiple variables: In the above exercise we introduced the various features of Tableau. Data is often exhibits more than 2 dimensions. Tableau handles this very elegantly. In this exercise we will analyze the data on the top 100 point scorers in NHL and examine what we can understand from this data and the conclusions we can make form this analysis. This exercise is based on an exercise discussed in reference [1] by B. Jones. We will use scatter plots as the main instrument for visualization.

1. Tableau connect to data excel worksheetNHLTop100.xlsx

2. Study the data. Check any of your favorite players are on the list. Study the attributes: points(P), games played (GP), assists (A) etc. Most of the “team played for information” is null.

3. Navigate to the worksheet; we will create a basic scatter plot.

4. Cntrl-Player-G-A; then click on scatter plot on the “Show Me” panel. You will see that Tableau has placed SUM(G) on the Column shelf, SUM(A) on the Row shelf and Player in Marks card “Detail”. Make sure you understand this selection: Since Tableau does a lot of things automatically you have to make sure the choices are acceptable for you.

5. Now we will analyze the plot. In class discussion.

6. We have compared two variable Goals and Assists. In the next step we will add 2 more variables. This is very easily done: drag P for points into Size shelf, and GP for games played into the Color shelf. Observe and understand the changes that happen.

7. Next change the “automatic” selection at the top of the Marks to “Circles”.

8. Next change the color palette from “green sequential” to anything attractive: “Orange-white- Blue-Diverging” by clicking the Color shelf. Also make the border black.

9. Mouse-over the circles and check out if your favorite player is present in the plot.

10. We will now label the circles. It is as easy a dragging the Player variable (or Dimension) to the Label shelf. Major difference between Gephi and Tableau is that Tableau displays as many labels as possible without creating a messy view. You can left click on Labels shelf and click on the allow overlap check box and see how messy it is if all the labels show.

11. It is possible to modify the tooltip to limit the information or add more information such as the team played for. Try this by left-clicking on the Tooltip and editing it.

12. You can also add annotations to the circles. We will do it Wayne Gretzky. Unclick the other labels to view only the annotation you made.

13. We will now add a filter to explore the data further. We are interested in the Position (Center (C), L, R, and Defense (D)). We will add a radio button style filter to the worksheet. We will add two more filters for +/- and PIM (penalty in minutes). Right click on these and select “Show Quick Filter”. You will see three filers appearing on the right side of the worksheet. 14. We will rename the sheet1 to GAGPG by right click on the sheet1 and renaming it.

15. Edit the axes by right clicking on the axes and making it fixed.

16. Then right click on it to duplicate the sheet 4 times. We will use the filters and carry out some exploratory data analysis (EDA). Rename the sheets by right clicking them and renaming them as (i) Centers (ii) Defensemen (iii) RightWingers (iv) LeftWingers

17. Visualize each of these by clicking on the right filter C,L, D, and R on the position filter.

18. You will be able to view all the plots on the same plane using presentation symbol for multiple sheets at the right top corner. Class /team discussion on the resulting exploratory plots.

19. Adding background images: using top line menu item MapBackground Images filename; In the Options menu that appears you can set Aspect ratio, and other choices. I have added an image “thegreatone.jpg” for you to use as background.

20. Calculated field: We want to compute Points per game (PPG), assists per game (APG) and Goals per game (GPG) and use this derived data for creating stacked bar plot.

21. Right click on the Measures area and create a Calculated field.

22. Change Data type of GP to “decimal” or Float by right clicking on it and choosing to change the data type. This change is needed for Calculated Fields mentioned above.

23. For GPG enter SUM([G])/SUM([GP]) in the formula box. For APG enter SUM([A])/SUM([GP]) in the formula box and click OK. For PPG enter SUM([P])/SUM([GP]), click OK. Now you will see all three calculated fields in the Measures area on the left of the worksheet.

24. Now we will do several “drag and drop”: Drag Player to Rows shelf, Measures Values to Column Shelf, Measures Names to Colors Shelf;

25. Remove all except newly calculated Assists per Game, and Goals per Game out of the Marks card/shelf.

26. Click on the blue Player pill on the Rows Shelf and select sort, descending order by Points per game field. Change colors if you prefer. We will discuss the plot that appears as shown below. 27. Next we want to explore regression and trend lines that are really very useful in predicting future demands as well as revealing any correlations.

28. Now you try the ease of Tableau: Draw a scatter plot for goals (G) vs shots (on goal) with position (POS) as color and player as tooltip. Click on the worksheet and choose Trend line Show Trend lines. Choose the “Force Y intercept zero” since no shots means no goals.

29. Study the linear regression model and the p-value (for correlation goodness) of the trend lines.

30. We will end this comprehensive exercise with a highly useful quadrant chart that is really useful identifying the performance of players (or effectiveness of certain business initiatives).

31. Right click on each axis and select Add Reference lines. Right click on each quadrant and  AnnotateArea type in the characteristic of the quadrant. {high production, high accuracy, low production, low accuracy}. Adjust the boxes that appear as shown below. 32. Save the Tableau exercise for future use, discussions, creation of Dashboards, and Story. It will be saved with the extension .twb (Tableau workbook).

Exercise 5: Tableau dashboard. We will create a dashboard with the NHL 100. A dashboard is comprised of one or more sheets. Unlike Powerpoint it offers superb interaction to the data and charts presented. 1. With the same workbook as created above, Click on Dashborad top line menu Dashboard new Dashboard rename it NHL100.

2. If you have closed the workbook, open it using File Open xyz.tbx where xyz is the name of the workbook that you saved in Exercise 4.

3. Important caution: Make sure all the worksheets have the axes fixed by clicking on the axis and selecting fixed axis.

4. After you create the new Dashboard, you will see the sheets available on the left panel. Drag and drop sheets you want on the Dashboard. We will compose a dashboard with these four sheets: GAGPG, Trendlines, QuadChart, StackPlot as shown below. You can adjust the layout as per your needs and also go back and change anything on the worksheet. Save the changes, it will automatically be updated on the dashboard. (You may need an artistic designer to design/optimize the layout).

5. Click on the presentation and see the visual and the interaction Tableau dashboard offers. You can also add background images to the dashboard that promotes your brand.

Exercise 6: Creating a Story for presentation. A “Story” in Tableau is like sequence of slides (or a slide deck) that is compiled from dashboards and worksheets created earlier. We will work with Tableau sheet that has worksheets and dashboard that are ready for use in a presentation. You can think of A Tableau Story as a presentation.

1. Open SheetDashBoardTableauStory.twb

2. We will go through a “Story” that is already created and available for you to review. We will go through the Story and understand the concept of a Tableau Story.

3. Create your own Story book: Story New Story

4. Right Click on the Story TitleEdit Title”World GDP/Population Show” or some such title

5. Observe the worksheets and the dashboard on the left panel; these form the raw material or “points” for your story.

6. You can see the content of these worksheet and dashboard by clicking on the tabs representing them at the bottom.

7. Now drag and drop a worksheet, say, “Population” on the workspace of the new story you created. Add a caption to this worksheet in the box just above the workspace.

8. Create a New Blank Point, and drag and drop the dashboard into the workspace; add a caption GDP.

9. You can then make “another point” by adding the worksheet PopulationAge into the workspace. Add a suitable caption.

10. As a last point in the story add worksheet GDPYear into the Story.

11. Now click through the story in the presentation mode and narrate your “story”. Make your point. Observe ease with which you can navigate through the presentation that has many features such as tooltips and selection of a particular item. You can also add filters to make it truly interactive data analytics discussion.

12. Save the Story for future use and presentation.

Tableau is all about creative assembly of data points. No programming necessary. With some artistic creativity and domain knowledge of the data being analyzed, Tableau can help in preparing impressive and convincing data-driven presentations. Use it as tool for your everyday discussions during team meetings.

Recommended publications