Visualization of Code Flow

DEGREE PROJECT, IN COMPUTER SCIENCE , SECOND LEVEL STOCKHOLM, SWEDEN 2015 Visualization of Code Flow YURI STANGE KTH ROYAL INSTITUTE OF TECHNOLOGY COMPUTER SCIENCE AND COMMUNICATION (CSC) Visualization of Code Flow Visualisering av kodﬂöde YURI STANGE Email at KTH: [email protected] Subject Area: Computer Science Supervisor: Stefan Arnborg Examiner: Stefan Arnborg Abstract Visual representation of Control Flow Graphs (CFG) is a feature available in many tools, such as decompilers. These tools often rely on graph drawing frameworks which implement the Sugiyama hierarchical style graph drawing method, a well known method for drawing directed graphs. The main disadvantage of the Sugiyama framework, is the fact that it does not take into account the nature of the graph to be visualized, specically loops are treated as second class citizens. The question this paper at- tempts to answer is; how can we improve the visual representation of loops in the graph? A method based on the Sugiyama framework was developed and implemented in Qt. It was evaluated by informally inter- viewing test subjects, who were allowed to test the implementation and compare it to the normal Sugiyama. The results show that all test subjects concluded that loops, as well as the overall representation of the graph was improved, although with reservations. The method presented in this paper has problems which need to be adressed, before it can be seen as an optimal solution for drawing Control Flow Graphs. Sammanfattning Visuell representation av ödesscheman (eng. Control Flow Graph, CFG) är en funktion tillgänglig hos många verktyg, bland annat dekom- pilerare. Dessa verktyg använder sig ofta av grafritande ramverk som im- plementerar Sugiyamas metod för uppritning av hierarkiska grafer, vilken är en känd metod för uppritning av riktade grafer. Sugiyamas stora nack- del är att metoden inte tar hänsyn till grafens natur, loopar i synnerhet behandlas som andra klassens medborgare. Frågeställningen hos denna rapport är; Hur kan vi förbättra den visuella representationen av loopar i en graf? En metod som bygger vidare på Sugiyama-ramverket utvecklades och implementerades i Qt. Metoden testades genom att hålla informella kvalitativa intervjuer med testpersoner, vilka ck testa implementeringen och jämföra den med den vanliga Sugiyama-metoden. Resultaten visar att alla testpersonerna stämmer in på att loopar, så väl som den overskåd- liga representionen av grafen förbättrades, dock med vissa reservationer. Metoden som presenteras i denna rapport har vissa problem, vilka bör adresseras innan den kan ses som en optimal lösning för uppritning av ödesscheman. Contents 1 Introduction1 2 Control Flow Graphs1 2.1 Directed graphs............................3 3 Representing code ow in a graphical way4 3.1 Desired Layout Attributes......................4 4 Previous works5 4.1 Graphviz...............................5 4.2 OGDF.................................5 5 Our goal6 5.1 Needed Terminology.........................7 6 Sugiyama Layout8 6.1 Cycle removal.............................8 6.1.1 Depth-rst search approach................. 10 6.2 Hierarchical Layering......................... 10 6.2.1 Longest Path......................... 12 6.2.2 Vertex Promotion algorithm................. 13 6.3 Dummy nodes............................. 13 6.4 Cross reduction............................ 16 6.4.1 Layer-by-layer Sweeping................... 17 6.4.2 Transposing.......................... 19 6.5 Coordinate Assignment....................... 20 6.5.1 Pendulum algorithm with linear segments......... 20 7 Representing the edges 22 7.1 Ports.................................. 22 7.2 Avoiding obstacles.......................... 23 7.2.1 Manhattan convention.................... 23 7.2.2 Edge tracks.......................... 24 7.2.3 Assigning tracks to edges.................. 25 8 Improving the representation of back edges 26 8.1 The main idea............................ 26 8.2 Introducing new nodes........................ 27 8.3 Prerequisites............................. 28 8.4 Starting o.............................. 28 8.5 Alterations to Sugiyama's crossing reduction step......... 29 8.5.1 Downsweeps (receiving layer)................ 30 8.5.2 Upsweeps (receiving layer).................. 30 8.6 Tying up the ends.......................... 31 8.6.1 Aesthetical Hungarian.................... 32 8.7 Alterations to the pendulum algorithm.............. 32 8.7.1 Ibeds............................. 32 8.7.2 Obeds............................. 34 8.8 Changes to the edge track partitioning............... 34 8.8.1 A priority approach..................... 34 8.8.2 Distributing tracks to obeds and ibeds........... 35 9 Implementation 36 9.1 Qt................................... 36 9.2 The Graphics View Framework................... 36 9.2.1 Edges............................. 37 9.2.2 Vertices............................ 38 9.3 About the algorithms........................ 38 10 Evaluation 38 10.1 About the interviews......................... 38 10.2 Results................................. 39 11 Conclusions 40 12 Future Work 43 12.1 Crossings between edge tracks.................... 43 12.2 Positioning of edges relative to their obed nodes.......... 43 12.3 Handling inevitable edge crossings................. 44 12.4 Worst case scenario in the Manhattan layout........... 45 12.5 Better port strategies......................... 46 1 Introduction Visual representation of code ow is a great tool available for programmers in order to understand, optimize, debug or reverse engineer programs. It consists of various steps where a visualization tool could be considered as the most im- portant step in the process, also the hardest to implement. Many factors play a role in the quality of the graphical result, often tied up to notions better dis- cussed in a paper about cognitive perception. Evaluation of such a tool is a fuzzy job where performance is not always best measured in numbers. The meaning of this paper is to take a look at the existing solutions and to explore methods better suited for code ow visualization. The code ow in this paper will be represented as a Control Flow Graph(CFG), which can often be seen in the literature dedicated to compiler optimization. Chapter 2 describes CFGs, how they are represented and their relationship to directed graphs. The requirements set on the layout algorithm are described in Chapter 3, while Chapter 4 presents previous works, their strengths and im- plementations. Chapter 5 introduces the goal of this paper. Chapter 6 describes in detail the Sugiyama framework, which our work is based on. Chapter 7 is about the way we chose to represent edges, which bind vertices together. Our method will be presented in Chapter 8, where details of its implementation can be seen in Chapter 9. The evaluation of our method, as well as our results are presented in Chapter 10. Our conclusions can be found in Chapter 11, while Chapter 12 will be about what we consider to be future work. 2 Control Flow Graphs A Control Flow Graph is a representation of all paths a program may have taken during its execution using a graph notation. In this case a directed graph, which we will describe in the following subsection. In a CFG each node (or vertex) in the graph represents a basic block, this means; a piece of code contiguous in memory without any jumps or jump tar- gets. If the rst instruction in the block is executed, then each subsequent instruction will also be executed (Cooper, Keith D. and Linda Torczon, 2004 cited in Saunders, 2007). Jumps in the code ow are represented as directed edges, where the block being pointed at by the edge is the jump target. An example of possible code, with the corresponding CFG can be see in Figure 1 on the following page. 1 Figure 1: Example of code and its corresponding CFG 2 Figure 2: Directed graph traversed with a DFS (visiting order specied by vertex numbering) 2.1 Directed graphs A directed graph is a set of vertices connected by edges, where the edges have a direction associated to them. If we traverse a directed graph by using a Depth- rst search1(Tarjan, 1972), we can categorize the edges into four types; tree edges, forward edges, cross edges and back edges. It should be noted that the classication of edges is tightly coupled to the order in which the Depth-rst search visited them. For an example see Figure2. • Tree edges describe the relation between a vertex and their direct descendants, they are usually associated with the natural ow of the graph. • Forward edges connect a vertex to some of its descendants, although not their direct ones. • Back edges describe descendant-to-ancestor relationships. They connect vertices to one of their ancestors in the graph. • Cross edges are all the other edges in the graph, they connect unrelated nodes in the partial order. Since CFGs are in fact directed graphs, it is not dicult to drawn parallels between them. Figure 1 on the preceding page demonstrates a if-then clause which gives origin to a cross edge. This paper will focus heavily on back edges, which are usually originated by loops in the code, such as while- and for- clauses. 1Algorithm for traversing a graph, where each branch is explored as far as possible before backtracking. 3 3 Representing code ow in a graphical way Our goal is to represent the CFG in such way that it becomes trivial to understand the code behind. A graph representing the code should be as powerful for a computer scientist, as a chart pie is for a statistician. This is where two

Visualization of Code Flow

Networkx Tutorial

Networkx: Network Analysis with Python

Graph Database Fundamental Services

Gephi Tools for Network Analysis and Visualization

Gephi-Poster-Sunbelt-July10.Pdf

Plantuml Language Reference Guide (Version 1.2021.2)

Travels of Baron Munchausen

A Comparative Analysis of Large-Scale Network Visualization Tools

A Domain-Specific Language for Visualizing Software Dependencies As a Graph Alexandre Bergel, Sergio Maass, Stéphane Ducasse, Tudor Girba

Using Graphviz As a Library (Cgraph Version)

Networkx: Network Analysis with Python

Drawing Graphs with Graphviz