Plotting Tools for Networks, Part I
Total Page:16
File Type:pdf, Size:1020Kb
ontent ▶ Edition ▶ User: Password: Log in | Register Plotting tools for networks, part I In the first two installments in this series on plotting tools (which covered gnuplot and matplotlib), we introduced tools for creating plots and graphs, and used the terms interchangeably to refer to the typical April 15, 2015 scientific plot relating one set of quantities to another. In this article we use the term "graph" in its This article was contributed mathematical, graph-theory context, meaning a set of nodes connected by edges. There is a strong family by Lee Phillips resemblance among graph-theory graphs, flowcharts, and network diagrams—so much so that some of the same tools can be coerced into creating all of them. We will now survey several mature free-software systems for building these types of visualizations. At least one of these tools will likely be useful if you are ever in need of an automated way to diagram source-code interdependencies, make an organizational chart, visualize a computer network, or organize a sports tournament. We will start with a graphical charting tool and a flexible graphing system that can easily be called by other programs. Flowcharting with Dia A flowchart is a diagram of a process, algorithm, workflow, or something similar. Flowcharts for different fields often employ a specialized graphical language of symbols that represent entities common to the field. For example, a circuit diagram is a type of flowchart that uses special symbols for diodes, resistors, and other circuit elements. There are flowchart languages for logic circuits, chemical engineering, software design, and much more. Dia, a free (in all senses) diagram editor for Linux and other systems, comes with symbol libraries encompassing all of these examples, plus many others, both common and exotic. And, if that's not sufficient, the program allows you to make your own symbols. Dia is a GUI program that uses the GTK+ libraries. You use it somewhat like Inkscape or other drawing programs. However, to make effective use of the program you should remember that you are not creating a drawing, but, rather, defining a set of relationships between entities. These relationships are represented by lines and curves (perhaps with arrowheads or labels), and the entities take the forms of the various symbolic shapes we mentioned above, often with their own text labels. The trick to defining these relationships through the graphical interface is to make the connections in the right way. Since it takes a while to extract these techniques from the documentation, we'll outline the steps here. After selecting the desired shape from the panel and dragging it out in the canvas to the approximate size you think it should be, immediately press Return and type the text label for the shape. The label will be properly centered, the shape will grow as required to accommodate it, and the label will be permanently attached to the shape and move with it. To connect two entities with a line, draw the line between the centers of the entities; you know you've hit the correct spot when the shape glows yellow. To attach a text label to a line (such as the "yes" and "no" labels in the screenshot) you need to follow a different procedure: with object snapping turned on, create a text entity using the text tool that looks like a "T", then drag it by its handle, connecting it to the line's attachment point. This point is indicated by a small "x" and is at the center of the line. A red glow will signal that you've made the attachment. If you've defined all your labels, entities, and connections using these techniques, then you'll be able to move the nodes around at will on the canvas until the chart is neat and easy to follow. The topology of the graph, which carries the actual information in the flowchart, won't change but, by moving things around, you can change a tangle of crossed lines into a neat diagram where the flow is clear. Dia saves your work in an XML file (a compressed one by default, though there is also an option to save it uncompressed), and can export it into a wide variety of image formats, including vector formats such as SVG. The program should be available in your package manager. Development is steady but moves at a slow pace, so it's likely you'll get the current version even from a conservative distribution. If you need to, however, you can download sources or binaries from Dia headquarters. Graphviz: infrastructure for graphs Dia is a useful and versatile tool for creating and laying out a graph by hand. Sometimes, however, we begin with a (possibly large) set of data that we want to visualize as a network-style graph or flowchart. We may also want to experiment with different types of visualizations or to produce different graph styles that present the same data for different purposes. Graphviz solves these problems by providing a declarative language, called "dot," that represents nodes and the connections between them as text. The dot language can accommodate a large set of visual and logical attributes of many types of nodes, their relationships, and their interconnections. Nevertheless, it's intuitive, with an easy-to-remember and readable syntax. Here is an almost-minimal example of a dot file that defines a simple graph: strict digraph "example" { A -> {B C}; D [shape = box]; C -> D; D -> C [color = blue]; } The keyword strict at the beginning means that no redundant edges are allowed; a digraph is a directed graph (meaning that the edges have a direction, oft n represented by an arrowhead at the end and, perhaps, at the beginning). The second line says that node A is connected with both nodes B and C, in the direction starting from A. The next line declares a new node called D and defines an attribute that specifies how D should be drawn. Then, we declare that C is connected to D, and that D is connected back to C. This last edge has an attribute specifying its color. The Graphviz infrastructure also comes with several layout engines that interpret dot files and produce the actual graphs. Some of the engines are for directed graphs, some are for undirected graphs, and some handle both types. The problem of taking a graph specification— with perhaps thousands of nodes—and producing a usable visual representation is not trivial, and is the subject of continuing research. Each of Graphviz's engines has a mathematical theory [PDF] behind it, and each will generate a different type of graph. For simple directed graphs such as the one represented in the dot file above, the engine called "dot" is usually best. We invoke it on the command line: dot -o simpledot.png -Tpng simpledot.dot This generates a PNG output file (one of many choices), using simpledot.dot as the graph specification. If we store the code snippet above into this file, we get the output shown here: It's clear how the definitions of nodes and edges have been translated into a picture. If we apply a different layout engine to the same dot file, for example fdp: fdp -o simpledot.png -Tpng simpledot.dot we get the same information, but depicted in a different style: A brief summary of the various layout engines that come with the system is provided in the dot man page. The dot engine produces a simple, hierarchical layout, whereas fdp, sfdp, and neato all treat the edges as springs. That is, they attempt to arrive at a neat arrangement by starting with a random layout and allowing the system to relax to a minimum energy configuration. The different engines will produce distinct results, as they are all based on different algorithms. The use of a language to define graphs means that Graphviz can serve as the graphical engine for other systems or programs; they merely need to format their output in the dot language. There are many examples of this. Snakefood is a program that analyzes Python programs and determines their dependencies. You can point it at a directory of Python files and it will return the interdependencies among the files and to external modules, in its own format, which is a collection of Python tuples. This output can be piped to a Snakefood utility that translates it into dot, which can then be processed by any of the Graphviz engines that can handle directed graphs—usually the dot engine is the best choice. Here is the result of applying the process to a directory holding the files for the bsddb package, an interface to the Berkeley DB library: The dot file corresponding to this graph is available for further exploration. You can also use Graphviz without using the dot language directly, by using one of its programming language interfaces. For example, the pygraphviz library for Python allows you to define a graph object, add nodes and edges to it, and create a graph image by calling the draw() method on the object. The Graphviz layout engine is selected with an argument passed to the draw() method. We've barely scratched the surface of what Graphviz can do. Facilities for subgraphs, record nodes, adding labels in HTML, and more make this a general-purpose powerhouse for any type of automated graph creation. Graphviz is free software (as is Snakefood) and should be available through your package manager. Stay tuned In part 2 of this article we'll continue our survey of network graphing tools by looking at two sophisticated libraries.