Rendering UML Class Diagrams to Support Layout Design

A thesis submitted to the Kent State University Honors College in partial fulfillment of the requirements for University Honors

by

Paul “P.J.” Leyden

December, 2019

Thesis written by

Paul “P.J.” Leyden

Approved by

______, Advisor

______, Chair, Department of Computer Science

Accepted by

______, Dean, Honors College

ii

TABLE OF CONTENTS

LIST OF FIGURES ...... iv

LIST OF TABLES ...... v

Acknowledgements ...... vi

CHAPTER 1 Introduction...... 1

CHAPTER 2 Related Work ...... 4

2.1 UML ...... 4

2.2 Comprehension and Layout ...... 5

2.3 Automatically Reverse Engineering UML Class Diagrams ...... 7

2.4 srcUML ...... 8

CHAPTER 3 srcML Infrastructure ...... 9

3.1 srcML ...... 9

3.2 SAX and srcSAX ...... 11

3.3 srcSAXEventDispatcher ...... 12

CHAPTER 4 The Development of srcUML ...... 13

4.1 Previous srcUML ...... 13

4.2 Generating ...... 15

iii

CHAPTER 5 Automatic Rendering of UML Class Diagrams ...... 19

5.1 Architecture...... 19

5.2 Implemented Output Types ...... 20

5.3 Extending srcUML to Support User Defined Layouts ...... 23

5.4 Implemented Layout Algorithms ...... 24

5.5 Explanation of a Layout Algorithm ...... 29

CHAPTER 6 Conclusions and Future Work ...... 35

Appendix A ...... 37

Appendix B ...... 39

References ...... 40

iv

LIST OF FIGURES

Figure 1 - Diagram showing the flow of execution in order to generate a UML class diagram...... 3 Figure 2 - Example program for demonstrating srcUML functionality...... 10 Figure 3 - yUML representation of Figure 1 Modified for spacing purposes...... 14 Figure 4 - UML diagram generated by yUML.me ...... 15 Figure 5 - DOT representation of the car manufacturer program. Modified for space purposes...... 17 Figure 6 - Diagram generated from the DOT output...... 18 Figure 7 - Example of overridden constructor for example DGML outputter...... 21 Figure 8 - Example beginning of the output function for the DGML outputter example. 22 Figure 9 - Enumeration addition example...... 22 Figure 10 - if-else statement addition example...... 22 Figure 11 - switch statement addition example...... 23 Figure 12 - Example usage of a newly added format...... 23 Figure 13 – Sugiyama layout produced by srcUML...... 26 Figure 14 - Example of a new algorithm implementation in the standard style...... 26 Figure 15 - A three clustered layout generated by srcUML...... 28 Figure 16 - svg_three_outputter's cluster determination implementation...... 31 Figure 17 - svg_three_outputter formal creation of OGDF clusters...... 32 Figure 18 - svg_three_outputters OGDF layout and print call...... 33 Figure 19 - Full example of the svg_three_outputter resultant UML diagram...... 34

v

LIST OF TABLES

Table 1 - Show the objects that are created and populated by the initialization functions provided by srcUML...... 30

vi

Acknowledgements

First, I would like to thank my thesis advisor, Professor Jonathan Maletic. He has had to put up with my antics for three long years and, despite that, he has given me many opportunities to improve and grow as both a computer scientist and a person. I would also like to thank Professor Michael Decker of Bowling Green State University for guiding me along in my research, answering my onslaught of questions, encouraging me to keep going and being a friend. I would of course like to thank those on my defense committee,

Professor Mikhail Nesterenko, Professor Alexander Seed and Professor Robin Selinger, for taking the time out of their busy schedules to hear out my defense. Also, I would like to thank Drew Guarnera for always being friendly and open to whatever random questions I could think up and throw at him. I would also like to thank the Honors Thesis

Advisor from the Honors College, Lori Michael, for always having time to meet when I had questions or doubts.

Lastly, I would like to thank my friends and especially my family for always having my back and being there for when insanity seemed like a definite option.

vii

1

CHAPTER 1

Introduction

The process of creating and maintaining software systems is a difficult task and only growing more so as new Application Program Interfaces (APIs), frameworks, and technologies are released. The total pool of all technologies is growing rapidly and developers are less likely to know every piece of tech. With companies wanting to stay on the leading edge of what is new in tech, developers are expected to be capable of learning these new systems quickly. Within this context, studies [1, 2] have shown that Unified

Modeling Language (UML) diagrams can be extremely useful.

The Unified Modeling Language (UML) aims to provide system architects, software engineers, and software developers with a way to analyze, visualize, design, and maintain software-based systems [1]. The language standard itself is a specification on how to draw different diagrams that lend themselves to software organization and flow. One diagram in particular, the UML class diagram, is used to illustrate classes, interfaces, and their associations in a static object model [2]. In other words, it presents a simplified, yet no less informative, visual graphic, mapping out a software system.

In the software development process, specifically the Agile Process, there is a stage in which the development team is supposed to update and create new design documents and diagrams. This is good for the initial build, however, over time as developers leave and new ones come in, design documents can be easily forgotten, leaving them misleading and

2

obsolete. This is why UML diagrams are used less often in the maintenance and evolution

[4] of software, as keeping UML diagrams current can be a tedious task. One can manually reverse engineer the UML diagram in order to help themselves understand the code, but this as well can be tedious. In order to solve this problem, Decker et al. [3] created srcUML to automatically reverse engineer accurate UML diagrams for just such situations.

srcUML currently outputs a text-based diagram in the yUML [4] format. yUML.me is a web service that takes this yUML formatted diagram and produces a visual representation. This presents three issues. First is that the layout of the diagram is dependent on yUML.me. Second is that one must use this online service to get the actual graphic instead of being able to do it on one’s own machine. Third is that yUML.me is not scalable. These issues present an array of problems. The first issue means that developers and researchers cannot manipulate the layout of the diagram or perform particular research on a layout. The second is that one must have internet access to generate the visual graphic from the yUML. Lastly, as projects get larger and larger, yUML.me becomes less and less efficient at generating the diagrams.

Resolving these issues benefits developers who can manipulate the layout to better suit understanding and comprehension, and researchers who can generate particular layouts for research and experiments.

The goal of this thesis is to resolve these issues by first expanding srcUML and implementing a framework for developing new layout algorithms. Secondly, the Open

Graph Drawing Framework (OGDF) [5] will be used to draw the graphical representations

3

of the diagrams so as to allow for local generation and to avoid using the unscalable yUML.me.

Below in Figure 1 is a diagram depicting the flow of execution going from source code to a completed UML class diagram.

Figure 1 - The architecture of the scrUML system.. As shown, two to three tools are required in order to automatically generate a UML class diagram. First, one must use srcML to generate a srcML markup of the original source code. Second, that srcML markup is passed to srcUML which takes the srcML and creates an internal UML model. This model can then be exported in three different output types.

4

DOT and yUML both require a third tool to generate a visual representation while SVG can simply be loaded in any browser to render.

This thesis is organized as follows. Chapter 2 gives the background and related work. Chapter 3 goes over the tools that were used to create srcUML. Chapter 4 describes srcUML and the works initial progress. Chapter 5 goes through the framework and overall contributions of this thesis along with descriptions of the current output types and layout algorithms available as well as how to create new ones. Finally, Chapter 6 examines the conclusions and possible future work in the subject.

5

CHAPTER 2

Related Work

This section explains some of the background information about the area, as well as, current work in the field. Then it relates that work to the work of this thesis. Section 2.1 describes UML class diagrams in general. Section 2.2 talks about the concepts of comprehension and layout as they pertain to UML class diagrams. Section 2.3 talks about previous work in the field of reverse engineering UML diagrams. Specifically, it talks about Sutton and Maletic’s work [6] including the mappings they created and their implementation of said mappings in pilfer, a tool for reverse engineering a UML class diagram, previously made by Sutton and Maletic [6]. Finally, Section 2.4 talks about srcUML before the completion of this work and the holes it left behind.

2.1 UML

UML class diagrams are visual graph representations of software-based systems.

In general, every class in a system is represented by a square node that contains information about the class such as its name, members and methods. These class nodes are then connected to each other via lines that represent the relationships between the classes. There are five types of relationships possible: Dependency, Association, Aggregation,

Composition and Generalization/Realization. Dependencies are the most generic form of relationship and generally means that the classes are related in a more maintainable way.

6

Associations are a “has a” relationship. For instance, a bank has a customer, implies an association between those two entities. Compositions are a type of association in which the entities that make up the compiling entity only exist as long as the compiling entity exists.

For instance, a bank has accounts in it. If the bank goes away, the accounts no longer exist.

Aggregations are another type of association, similar to a composition with one major difference: the gathered entities will still exist if the compiling entity goes away. For instance, a parking garage has many cars parked in it. If the garage shuts down the cars still exist. Finally, Generalizations/Realizations represent inheritance. For example, let us say I have a class Animal and a class Bear. Animal generalizes Bear and Bear is a realization of

Animal. Pragmatically, Animal is an abstract class and Bear inherits from it.

2.2 Comprehension and Layout

When talking about the usefulness of UML class diagrams, one must also acknowledge that not all class diagrams are equal. Many studies have shown that there is a significance to the layout of a UML diagram in comprehending the system that the diagram is representing.

Sun and Wong present a series of layout criteria based on human perception principles and determine through the evaluation of two UML modeling tools that, these principles must be important in some facet [7].

7

Yusuf et al. uses eye tracking technologies to assess the comprehension of UML diagrams. Their results show that experts tend to use stereotypes, color, and layout to more efficiently navigate a UML diagram [8].

Sharif and Maletic analyze and experiment with how layout and stereotypes affect comprehension [9, 10]. For reference, stereotyped class diagrams are exactly the same as regular class diagrams with the added component of a class generalization, or stereotype.

A stereotype can be generic or very specific depending on how you determine said stereotypes. For instance, a class that holds long-lived data or information will be given the stereotype of Entity. Once a stereotype has been given to each class, one can arrange or layout the diagram such that stereotypes are completely or semi-clustered.

Sharif and Maletic compared three styles of layout as defined in a previous study by Andriyevska et al. [11]. The first is a typical orthogonal layout, where stereotypes are ignored and the graph is laid out in a hierarchical fashion. The second, is a completely clustered layout where each cluster contains all the classes of a particular stereotype. Last is a semi-clustered or multi-clustered layout where both stereotypes and coupling where taken into account. For reference, coupling is the concept of how inter-connected or reliant any two particular classes are to each other. The results of the experiment show that the multi-clustered layout provides a significant improvement in performance accuracy for both UML and design tasks.

Once again, all these studies show that layout is an important aspect of comprehension when looking at and reading a UML class diagram. These findings are

8

what lead to the notion that having a framework for implementing new user defined layout algorithms will be a major benefit to the automatic UML class diagram generator.

2.3 Automatically Reverse Engineering UML Class Diagrams

A substantial amount of previous work has been done on automatically extracting design information and reverse engineering UML class diagrams. Korshunova et al. [12] present a tool called CPP2XMI. The tool is capable of creating UML diagrams from C++ code; however, it does not allow for custom layout algorithms, instead it automatically generates a layout. Barowski and Cross [13] discuss a way of recovering class dependency information from Java class files. However, previous work by Sutton and Maletic [6], points out that the definition of dependency here is too general and does not apply to more specific UML relationship types. Sutton and Maletic also point out a few other inconsistencies in existing tools. For instance, Microsoft Visio is unable to determine associations, Visual Paradigm creates dependencies where there should be associations and

Rational Rose C++ Modeler only creates aggregate associations. The biggest hindrance is that the mechanisms these tools use for reverse engineering are not public known. They do note however, that these inconsistencies may come from the fact that there is a large gap in the semantic differences of C++ and UML. These differences refer to the fact that UML is meant as a tool for laying out ideas or depicting current designs. A UML diagram will only have as much detail as one wants, meaning that something not being shown in a UML diagram does not mean it does not exist in the actual system.

9

Sutton and Maletic [14] address these semantic differences by defining a mapping between C++ and UML. One example is definitively defining an interface as a class that defines only public, pure virtual, methods; defines no member variables; has no constructors or destructors; and, if it inherits from another class, that other class should also be an interface. To realize some of these mappings the team created a tool called pilfer

[6], a reverse engineering tool for UML class diagrams. As pointed out by Decker et al. [3] however, even pilfer had its flaws. It is implemented using an inefficient and non-scalable

Document Object Model (DOM) and has never been made available publicly, thus the need for a tool that is both publicly available and scalable.

2.4 srcUML

As mentioned previously srcUML was created by Decker et al. [3]. The original tool generates yUML, a text-based diagram format. The tool aims to address the issues that were present in pilfer. srcUML does this by using a SAX approach for creating the internal model which is a more scalable alternative to the DOM used in pilfer. This is explained in more detail in 3.2 SAX and srcSAX. srcUML is also a public repository on GitHub and is therefore available for review and use.

srcUML, before the work of this thesis, is still subject to the limitations of the yUML online diagram generator [4]. These limitations include low scalability, a single pre- defined layout algorithm, and requirement of internet access. These limitations are what this work looks to overcome.

10

CHAPTER 3

srcML Infrastructure

What follows is a brief explanation of srcML and the related tools that are used by srcUML. Section 3.1 talks about srcML. Section 3.2 talks about SAX and srcSAX. Finally,

Section 3.3 details the srcSAXEventDispatcher.

3.1 srcML

srcUML owes its existence to the underlying technology of srcML, as without it, the core of what srcUML does will not work. The word srcML defines two things, first is an XML format for source code. For reference, XML (eXtensible Markup Language) is a markup language similar to HTML that has helped in the exchange of data over the internet

[15]. The second definition is: a lightweight, highly scalable, robust, multi-language parsing tool to convert source code into srcML [16]. The tool itself creates an abstract syntax tree by gathering information and embedding that information as XML tags around the source code. At the moment, it supports C++, C, C#, and Java [16, 17]. The purpose of srcML is to make exploring, analyzing and modifying code incredibly efficient. For our purposes, the work uses the analysis capabilities of srcML in conjunction with srcSAX and

11

srcSAXEventDispatcher to extract information from the source code quickly and thoroughly.

Figure 1 shows a simple example program (top) and the corresponding srcML format (bottom).

Source Code Example

class Employee{};

class MyAppWindow{};

class Register{ private: Employee e; MyAppWindow maw; }; Corresponding srcML Markup

class Employee{};

class MyAppWindow{};

class Register{ private: Employee \ e; MyAppWindow \ maw; }; Figure 2 - Example program for demonstrating srcUML functionality.

The program in Figure 1 contains three classes: Employee, MyAppWindow, and

Register. The Register class has two private member variables, one each of the other

12

two classes. This example will be used throughout the paper to demonstrate the different stages of the work flow required to create a UML diagram.

In the srcML format, all the original code, spacing, comments, and other text, is preserved and XML tags are simply put around it to provide meta knowledge of the programming languages syntax.

3.2 SAX and srcSAX

When working with XML documents, such as Figure 1, one will often want to gather information from its tree-like structure. SAX, which stands for Simple API for

XML, is for this purpose exactly. SAX is an event-driven application program interface

(API) for gathering and extracting information from XML documents. Breaking it down,

SAX is a way for a programmer to set up events that, if or when they occur, trigger particular functions or methods. srcSAX is therefore, a SAX implementation built specifically for navigating srcML documents.

The benefit of having a tool like srcSAX is that it provides a simple way to iterate through a srcML document and extract the detailed meta-knowledge stored within. Due to

SAX and srcSAX’s event-driven nature, they are highly efficient at gathering such information. While srcUML does use srcSAX, it uses it in an indirect fashion through the srcSAXEventDispatcher, described in the next section.

13

3.3 srcSAXEventDispatcher

It is known that for many new programmers, the idea of event-driven programming can be challenging, especially if they are not already familiar with the concept. For this reason, we have srcSAXEventDispatcher, a higher-level API that helps simplify the process of implementing and using srcSAX.

The srcSAXEventDispatcher simplifies srcSAX by binding event callbacks to every specific tag in srcML. Where srcSAX will have just an open-tag event callback, srcSAXEventDispatcher has an open-function-tag or an open-class-tag event callback.

This removes the need to identify what kind of tag event is being called making it much easier to systematically perform a particular callback task on all tags of a particular type.

14

CHAPTER 4

The Development of srcUML

The work of this thesis in enhancing srcUML has been a three-year process where a large portion has been devoted to learning, developing, and experimenting with different ways of improving the research tool. Section 4.1 deals with srcUML as it was before the completion of the thesis. Section 4.2 deals with the initial experiment srcYUML2graphViz and how it generated the DOT format from Graphviz [18].

4.1 Previous srcUML srcUML was initially created by Decker et al. [3]. Originally called srcYUML, its initial purpose was to reverse engineer UML diagrams automatically from source code. It is the spiritual successor to pilfer and was designed to be more scalable and efficient. The tool takes srcML as input and produces yUML as output [3]. To start, it gathers syntactic information from the srcML document using the srcSAXEventDispatcher and then uses that information to create an internal model. Structures in source code that are of some significance to a UML diagram are given a class definition that objects could then be created from, this is how the internal model is structured. For instance, the class srcuml_class, details objects that contain all information about a class in a program. If a program that we are looking to create a UML diagram for has three classes, it will have

15

three srcuml_class objects. These classes and their information are then analyzed using

Sutton and Maletics [14] mappings to determine what kinds of relationships they possess.

Once the system analyzes the internal model for relationships, it creates the yUML output by systematically iterating through and interpreting the representation as yUML defines.

Figure 2 shows an example of the yUML representation.

[«datatype»Employee] [«datatype»MyAppWindow] [«datatype»Register|- e: Employee;- maw: MyAppWindow;] [«datatype»Register]++-e>[«datatype»Employee] [«datatype»Register]++-maw>[«datatype»MyAppWindow] Figure 3 - yUML representation of Figure 1 Modified for spacing purposes. Each line represents either a class or a relationship. The first three lines represent classes and are formatted as follows: [ name | attributes | operations ]. The first part here is straight forward as it represents the name of the class. The second section, labeled attributes, contains information regarding member fields in the class. The last section, labeled operations, contains information regarding the methods of the class.

The last two lines in the figure represent two separate relationships and are formatted like [name] relationship [name], where the relationship is specified by the type of arrow such as, -.-> or ++-> or <>-> etc. The type of relationship and arrow drawn is dependent on the characters that connect the names of the classes. For instance, [Register]-

.->[Shop] will draw a dashed line with a stick arrowhead to represent a dependency.

Figure 3 shows the subsequent UML diagram as rendered by yUML.me. More information can be found on the yUML.me website [19].

16

Figure 4 - UML diagram generated by yUML.me

The process itself is quick and efficient, producing the yUML, based on the software system Calligra [20](~1,144KLOC), in under 20 seconds [3]. However, there are drawbacks to the program. Decker et al. outlines in their Conclusion and Future Work that goals for evolution include: the inclusion of more abstract design in the representation, local rendering using Graphviz, and the automatic detection of what aspects of a UML diagram will be most useful to a developer looking to use the diagram to understand a system [3].

4.2 Generating Graphviz

During the first year of development, the work tackled the issue of local rendering by creating a compiler that converts the yUML output of srcUML to a DOT representation.

The DOT representation is a simple text-based graph format that could then be used in conjunction with Graphviz and the dot program to generate a UML diagram. The reason for this push is that while the yUML graph generator creates a decent diagram, it is not

17

scalable. As yUML.me is an online service, it can be slow as the project the diagram is meant to represent grows in size and is restricted to internet access.

The process of creating the converter, was likely the longest as the process required many steps before real work could begin. The first step was learning to create, run, use, and develop on a Unix environment as it made the process far easier to understand. As well, at the time, srcML was far easier to use and install on a Unix machine. Once accomplished, step two is to use and understand srcML, srcSAX, srcSAXEventDispatcher, and srcUML. These programs are cornerstones of the results of this work and thus, understanding them was an important initial step.

The next major step was to begin writing the program. The main function of this new program is to read the yUML output of srcYUML, take the information gathered and create a DOT output. In order to accurately parse the yUML text output, the software library ANTLR [21] is used. ANTLR, which stands for ANother Tool for Language

Recognition, is a parser generator. This means that it can generate the code necessary to read, iterate, and parse other languages. In this case, we used ANTLR to generate a parser for yUML. To do this, one creates a grammar file that defines how the language is structured. This is then given to the ANTLR program to create the parser. Once created, the parser allows us to iterate through the yUML output and systematically change the information to fit the DOT standard [18]. The ANTLR grammar used is shown in

Appendix A. Below, in Figure 4, is the DOT created from the yUML shown in Figure 2.

The layout of the DOT format is quite similar to that of yUML. We first define the type of graph, followed by how the nodes and edges should be drawn. Next, we write out the

18

definition of the classes and finish with the definition of the relationships between said classes. digraph hierarchy { node[shape=record,style=filled,fillcolor=gray95] edge[dir="both", arrowtail="empty", arrowhead="empty", labeldistance="2.0"] class0[label = "{ «datatype»Employee}"] class1[label = "{ «datatype»MyAppWindow}"] class2[label = "{ «datatype»Register|- e: Employee\n- maw: \ MyAppWindow\n}"] class2->class0[arrowhead="vee", arrowtail="diamond"] class2->class1[arrowhead="vee", arrowtail="diamond"] } Figure 5 - DOT representation of the car manufacturer program. Modified for space purposes.

The difference between yUML and DOT is in the syntax. To represent classes in

DOT, we first name the class node and give it a label. The label for a class node contains information about that class that we want to write within the confines of the node. The information written in a class node includes the class name, methods, and attributes. For example, in Figure 4, we see class2 has a label assigned to it. The first part of the label, everything before the first pipe character, is the name. In this case we have both a stereotype, «datatype», and a proper class name, Register. The second part of the label contains info about the fields. In this case we have – e: Employee\n. The dash at the beginning signifies that this is a private member. The e represents the name given to the field. Lastly, Employee tells us the type of the field. The backslash n is for formatting purposes. This example does not have a method in the label, but it is very similar to the fields. First a representation of its scope so, - for private, + for public, or # for protected followed by the name of the method.

19

Once the DOT output is created, it can be given to the dot program in order to generate a UML diagram like that of Figure 5.

Figure 6 - Diagram generated from the DOT output.

20

CHAPTER 5

Automatic Rendering of UML Class Diagrams

Now that the problem of scalability is addressed, attention turns towards the next step; a way for users and, especially, researchers to provide and modify the layout and output type of UML diagrams. The work aimed to create a framework for researchers and developers to create new layout algorithms and output types so that UML diagrams can support further researcher and developer preferences. This section is divided into five parts.

Section 5.1 describes the new architecture of the system itself, explaining the organization and modularization of the program. Section 5.2 describes the different output types that are currently possible and the process required to make a new output type. Section 5.3 explains how we extend srcUML to support user defined layouts and how we settled on OGDF as the rendering engine. Section 5.4 details the current layout algorithms available and the process to create new ones. Finally, Section 5.5 provides a full walkthrough of the implementation of svg_three_outputter.

5.1 Architecture

srcUML is divided into three parts: client, model, and generator. This section will describe each of these and their functions. The client section is dedicated to the cpp file that contains the main method. This, while having already been created for srcUML in the

21

past, will still undergo some changes to incorporate a more expected and intuitive command line interface.

The model section makes up the code-based representation system that processes and stores the information obtained from srcML and srcSAXEventDispatcher. Most of this, again, is already implemented in srcUML and will only undergo a few minor changes.

Finally, is the generator section. This section’s purpose is to process the information in the internal model and generate new output formats and algorithms. The major contribution of this thesis is contained within this section. Following is a description of the generator’s two main goals and functions: Section 5.2 details implementing new output types and Section 5.3/Section 5.4 details how we extend srcUML and presents the currently implemented layout algorithms.

5.2 Implemented Output Types

To create a framework that can be easily built upon, we opted to use an inheritance structure. This approach made the most sense. We create a baseline class that outlined, in general, what we need for creating outputters, as we called them. These outputters can then be tailored to different output types. As of now, there are three outputter types: yUML,

DOT, and SVG. SVG, which stands for Scalable Vector Graphics, is an XML based markup language used for describing two dimensional vector graphics [22] In addition, to adding support for SVG as an outputter, we further build off of the user defined layout

22

component as an extension to this outputter. We discuss further in 5.4 Implemented Layout

Algorithms. The general process of creating a new outputter is as follows.

1. Create a new class that inherits srcuml_outputter. Keeping to the

naming convention an example of a new class for DGML (Directed Graph

Markup Language) [23, 24], will be dgml_outputter.

2. Override the constructor in a similar fashion as the other outputters. dgml_outputter(bool method, bool attr){ show_methods = method; show_attributes = attr; } Figure 7 - Example of overridden constructor for example DGML outputter.

3. Override the output method. The output method itself takes two parameters:

a standard output stream and a vector of pointers to srcuml_class objects.

These parameters are important as the functionality of the output method is

to convert the information given by the vector and output it in the desired

format manually. The first thing to do in the output method however, is to

call the analyze_relationships method as this will analyze the class

object and create a srcuml_relationship object that contains

information on how the classes relate on a semantic and syntactic level. All

the information needed is now available and can be traversed to output in

the new format. Figure 7 shows how the beginning of our example may

look.

23

bool output(…){ srcuml_relationships rel = analyze_relationsips(classes);

out << ""; out << "

} Figure 8 - Example beginning of the output function for the DGML outputter example.

The above lines are a requirement for a document in the DGML format and

are output in a direct way to the final document.

4. The last step is to incorporate the new format into the srcuml_handler.hpp

file. At the moment this method of inclusion is poorly designed and a

potential area of future work, however, for the time being it is sufficient.

a. First, one must add the new format name to the output_type

enumeration at the top. Figure 8 shows the enumeration that keeps

all the possible output types. enum output_type {dot, yuml, dgml, etc…}; Figure 9 - Enumeration addition example.

b. Second, one must add the new format to the type selector area of the

constructors. Figure 9 shows the if-else statement that is responsible

for determining which output format you want from a command line

argument. In our example we will likely add the following:

… else if (t == "dgml"){ type = dgml; } Figure 10 - if-else statement addition example.

24

c. Last, within the run method, one must add a new case to the switch

statement for the new format. Figure 10 shows this.

… case dgml: { std::cout << "DGML Called\n"; dgml_outputter outputter(methods, attributes); outputter.output(out, classe); } break; Figure 11 - switch statement addition example.

With those steps completed, one could now successfully use the new format from the command line as shown in Figure 11. srcuml example.xml -tdgml -oexample.dgml Figure 12 - Example usage of a newly added format.

With the new structure, new output types can be created and used.

5.3 Extending srcUML to Support User Defined Layouts

The second goal of this thesis is to create a way to specify user defined UML layout algorithms. For this, we used SVG in conjunction with the Open

Framework (OGDF) [25].

During the progression of this thesis, a few different graph drawing libraries were explored as potentially useful to the graph drawing process. OGDF (Open Graph Drawing

Framework) is currently the main engine for drawing visual graph representations in

25

srcUML. Our choice of OGDF is based on three criteria. First is the fact that OGDF provides many different output options including SVG, GML, Rome-Lib, LEDA, Chaco,

Y-Graph, Graph6, and BENCH [10]. Second is that OGDF provides a significant amount of control over the layout itself, providing pre-implemented layout algorithms that can be used with little configuration. Last, OGDF is well documented and open source. Having such resources as decent documentation and direct access to the source code can be invaluable when trying to use an API as exhaustive and complete as OGDF. For the work’s current purposes, it uses SVG as its main output type as custom changes were made to the

SVG printer in order to accommodate some of the particular UML drawing quirks that are not supported by OGDF natively. One of the changes made to the SVG printer is adding the ability to choose the type of end arrow that should be printed based on a node/edge string map. More details are provided in 5.5 Explanation of a Layout Algorithm. This allows for the proper drawing of UML arrow types. The other major change is the ability to provide a formatted label string that will print out a properly spaced and divided node that contains information about a class such as attributes and methods. This change is important as it abstracts away the need to tediously calculate the position of text and dividers in the node itself.

5.4 Implemented Layout Algorithms

Of the three types of outputters currently implemented in srcUML, the svg_outputter is the most useful in terms of layout. In fact, at the moment it is the only

26

real way to affect the layout of diagrams in a meaningful way. The svg_outputter itself is actually a sub interface from srcuml_outputter and cannot itself produce a diagram of any kind. Take svg_sugiyama_outputter for example, it inherits from svg_outputter and implements a Sugiyama style layout for the UML diagram and prints it using OGDF. The Sugiyama Layout is a type of hierarchical layout which minimizes edge crossings and all of the edges run along and point in the same general direction [26,

27]. Through this format we provide a template with access to methods from both srcUML’s model module which contains all the information gained from srcML, and methods from OGDF which provide a robust and varied set of algorithms and tools for manipulating layouts.

Currently there are two completely implemented layout algorithms under the svg_outputter, the Sugiyama layout and a three cluster layout based on Andriyevska et al. [11]. Each of these uses a different preset of methods created to make it easier to create new algorithms. They also provide an illustrative example for creating new algorithms.

The more work we can take out of the picture for those who could create masterful algorithms, the better. Following will be a short explanation of the Sugiyama layout along with a process for creating a similar layout and an explanation of the three-cluster layout.

First is a Sugiyama style layout [26] under svg_sugiyama_outputter. Figure

12 shows the same simple example from Figure 1 and while it is a small example one can see its similarity to the output produced by the DOT program.

27

Figure 13 – Sugiyama layout produced by srcUML.

The svg_sugiyama_outputter is an example of what we have dubbed a standard layout, in that, it has nothing more than nodes and edges. To create a similar layout algorithm, one will follow, almost the same set of steps as with the DGML example.

1. First create a new class following convention and named with the new

layout name in mind. class svg_xyz_outputter : public svg_outputter { svg_xyz_outputter(bool method, bool attr){ show_methods = method; show_attributes = attr; }

bool output(…){ init_standard(classes);

//code for the new layout using OGDF methods //goes here.

} } Figure 14 - Example of a new algorithm implementation in the standard style.

2. Once the framework above is written out, in the section labeled //code

for new layout… is the slate on which the developer can create their new

28

algorithm. There are many useful methods and functions provided by

OGDF for creating and manipulating layout algorithms, refer to [5] for more

information. The convenience of the init_standard() method is that it

fully initializes the standard objects needed by an OGDF layout algorithm

and creates maps for convenient iteration and searching. Table 1 outlines

the objects provided by both the standard and clustered initialization

methods and the next section will go into more detail on a full

implementation.

The second is a clustered layout based on one of the layouts used in Andriyevska et al.’s paper [11]. One of the multi-clustered layouts used is based on only the idea of class stereotypes and keeping those stereotypes close. The stereotypes used were that of

Boundary, Control, and Entity as defined in The Unified Software Development Process

[28]. Boundary is a class stereotype defined as a class that encapsulates an interaction between the system and its actors. Control is a class stereotype defined as a class that represents the coordination, sequencing and control of other objects. Entity is a class stereotype defined as a class that stores long lived data and information [28]. Currently, stereotype information is specified as attributes which are part of the XML input to srcUML where it is automatically collected by the srcSAXEventDispatcher and included in the srcUML data model. For this thesis, the stereotypes are computed manually, however, separate work is being done to enhance the tool stereocode [29] to automatically compute these stereotypes. srcSAXEventDispatcher to find. There are plans for this to be done automatically in the future. In srcUML we call this outputter

29

svg_three_outputter. Figure 14 shows our example from Figure 1 put through the svg_three_outputter.

Figure 15 - A three clustered layout generated by srcUML.

We used our example of the three classes, Employee, Register, and

MyAppWindow. Each of the three classes represents one of the three main stereotypes used. In this case, Register is control and in red, MyAppWindow is boundary and in green, and Employee is entity and in blue. Each of the clusters is given its own bounding box that encompasses all the classes that fall into that cluster. Due to the simplicity of the example, Figure 14 only has one class in each stereotype.

30

5.5 Explanation of a Layout Algorithm

Provided is a full explanation of the implementation of the svg_three_outputter. We will also present Figure 18, a more complete and complex example of a UML diagram being generated using this outputter.

For this particular implementation, we were able to used one of OGDF’s many pre- implemented layout algorithms as, the creation of the actual algorithms is not the focus of this thesis. However, this example does give an insight into how new algorithms are created.

Before we present the implementation, we provide Table 1 which shows a list of all the objects populated by the init_clustered() and init_standard() functions for the convenience of the programmer creating a new layout algorithm.

Graph g The general OGDF graph object. GraphAttributes ga The OGDF object that contains information about node positions, labels, color, etc. ClusterGraph cg The general OGDF cluster graph object. ClusterGraphAttributes cga The OGDF object that contains information about cluster positions, labels, color, etc. map, node> A map from the srcUML class object to class_node_map the node in OGDF that represents it. map A map from the name of the class to the class_name_node_map OGDF node that represents it. multimap, A map from a pair of nodes to the type relationship_type> edge_type_map relationship type that connects them. map, string> A map from a node/edge pair to the node_edge_arrow appropriate arrowhead style in string form. Table 1 - Show the objects that are created and populated by the initialization functions provided by srcUML.

31

As mentioned earlier, svg_three_outputter is a clustered layout algorithms based on the layout concept presented in Andriyevska et al.’s paper [11]. It focuses around the idea of stereotyped classes being grouped together above all else. As such, the first task completed by the svg_three_outputter implementation is to cluster the classes based on their stereotype. Figure 15 shows the process for determining the clusters.

//======SList ctrl, bndr, enty;

for(auto pair : class_node_map){

std::string stereo = ""; if(pair.first->get_stereotypes().begin() != pair.first \ ->get_stereotypes().end()){ stereo = *(pair.first \ ->get_stereotypes().begin()); }

Color &color = cga.fillColor(pair.second);

if(stereo == "control"){ color = Color(224, 0, 0, 100); ctrl.pushBack(pair.second);

}else if(stereo == "boundary"){ color = Color(0, 224, 0, 100); bndr.pushBack(pair.second);

}else if(stereo == "entity"){ color = Color(0, 0, 224, 100); enty.pushBack(pair.second);

}else if(stereo == ""){ color = Color(130, 130, 130, 200); } } //======Figure 16 - svg_three_outputter's cluster determination implementation.

First, we declare three SList container objects to store references to objects belonging to the three cluster types, control, boundary and entity. Next, we set up a

32

for each loop to iterate through the class_node_map. This object is provided by the initialization function and it maps all the classes, specifically the srcuml_class objects to their respective nodes, more info in Table 1. The for each loop then allows one to systematically set the color and cluster of each node. The way OGDF uses their objects is through methods that return references, thus letting one affect the value stored in the object itself. In Figure 15, we use the fillColor method to access the color of the nodes and change them based on their stereotype.

The next step in the process is to formally create the clusters in OGDF so that the layout algorithms can understand what exactly we want clustered. Figure 16 shows this.

//======cluster entity = cg.createCluster(enty); cluster control = cg.createCluster(ctrl); cluster boundary = cg.createCluster(bndr);

cga.label(control) = "Control"; cga.label(boundary) = "Boundary"; cga.label(entity) = "Entity";

cga.strokeColor(entity) = Color(0, 0, 0, 255); cga.strokeColor(control) = Color(0, 0, 0, 255); cga.strokeColor(boundary) = Color(0, 0, 0, 255);

cga.strokeWidth(entity) = 1.5; cga.strokeWidth(control) = 1.5; cga.strokeWidth(boundary) = 1.5;

cga.fillColor(entity) = Color(0, 0, 224, 50); cga.fillColor(control) = Color(224, 0, 0, 50); cga.fillColor(boundary) = Color(0, 224, 0, 50);

cga.setFillPattern(entity, FillPattern::Solid); cga.setFillPattern(control, FillPattern::Solid); cga.setFillPattern(boundary, FillPattern::Solid);

//======Figure 17 - svg_three_outputter formal creation of OGDF clusters.

33

Here we create each of the clusters by calling the createCluster method on the

ClusterGraph object while simultaneously storing the returned reference to said cluster.

This allows us to, in the subsequent lines, affect the attributes of the clusters like stroke and width color, much like we did with the node color in Figure 15.

Finally, with the clusters created, all that is left is to run the layout algorithm and print. Figure 17 shows this.

//======ClusterPlanarizationLayout cpl; cpl.call(g, cga, cg);

GraphIO::SVGSettings* svg_settings = new ogdf::GraphIO::SVGSettings(); if(!drawSVG(cga, out, *svg_settings, node_edge_arrow)){ std::cout << "Error Write" << std::endl; } //======Figure 18 - svg_three_outputters OGDF layout and print call. First step here is to create a layout object, in Figure 17, this object is of type

ClusterPlanarizationLayout and is called cpl. Once created we call the call method and provide the requested objects, in this case: Graph, which is the object representation of the graph; ClusterGraphAttributes, which contains information about the nodes and edges contained in Graph; and ClusterGraph, which is the object representation of the clusters themselves. All of which are described in Table 1. This method call will then manipulate the positioning of the clusters and nodes and leave us the job of printing everything out. To reiterate, we are using one of OGDF’s pre-implemented layout algorithms as the goal of this thesis is not to develop new layouts but provide a framework for making them.

34

The next step then is to create a print settings object. This object, called svg_settings in Figure 17, simply contains some basic settings that can be changed in order to manipulate the style of the SVG graph printed. For instance, you can change the degree to which all edges curve. Once you have that you can call the print method and, once again, provide the requested objects; in this case, the ClusterGraphAttributes, the output stream, the settings, and the node_edge_arrow array which allows for proper arrow drawing.

Below in Figure 18, is a full example of the UML output from running the svg_three_outputter.

35

Figure 19 - Full example of the svg_three_outputter resultant UML diagram.

36

CHAPTER 6

Conclusions and Future Work

srcUML is a highly efficient and accurate tool for reverse engineering UML class diagrams [3]. In this thesis we described the usefulness of outputting the diagrams directly, without the use of an online tool, and the usefulness of utilizing custom output modules so that one could change the desired output type or layout algorithm. This work addresses these issues by expanding the existing tool srcUML to utilize OGDF to allow users to define custom layout algorithms. This allows researchers to develop and investigate layouts that improve comprehension and for developers to create/modify algorithms for personal preference.

srcUML is still under development. There is still room for improvement and additional features. One item of importance is the decoupling of output type from layout algorithm. As of right now, you can either choose your desired type and be forced into a particular layout, or, you can choose your layout type and be forced into using SVG.

Another is be the creation of more actual layout algorithms, as of now there are only two well established and working layouts. In particular, Andriyevska et al. observed that layouts that utilize class stereotypes and the relationships between the classes are better for comprehension [11]. The svg_three_outputter is rudimentary at best and implements a simple clustered layout provided directly from OGDF using only the stereotypes.

37

Appendix A

grammar srcYUML2graphViz; yuml : ( classDef | relationship | NEWLINE )+ EOF ; relationship : classDef relation classDef ; classDef : '[' classID ( '|' variables ('|' methods )? )? ']' ; relation : aggregation | composition | realization | generalization |\ dependency ; aggregation : relationText '<' relationText '>' relationText '-'\ relationText '>' ; composition : relationText '+' relationText '+' relationText '-'\ relationText '>' ; realization : relationText '^' relationText '-' relationText '.'\ relationText '-' ; generalization : relationText '^' relationText '-' ; dependency : relationText '-' relationText '.' relationText '-'\ relationText '>' ; classID : text ;

38

variables : vmText ;

methods : vmText ;

text : ( LETTER | NUMBER | UNICODE | '{' | '}' | '~' | ('\t') |\ ('\r') | ('\b') | ('-') | ('+') | ('#') | ('<') | ('>') | '(' | ')' |\ *( ('،') | '«' | '»' | '[' | ']' | '*' | ' ' | ':' | ';' ;

vmText : ( LETTER | NUMBER | UNICODE | '{' | '}' | '~' | ('\t') |\ ('\r') | ('\b') | ('-') | ('+') | ('#') | ('<') | ('>') | '(' | ')' |\ *(('.') | ('،') | '«' | '»' | '[' | ']' | '*' | ' ' | ';' | ':' ;

relationText : ( LETTER | NUMBER | UNICODE | '{' | '}' | '~' | ('\t') |\ ('\r') | ('\b') | ('#') | '(' | ')' | ':' | ' ' | '*' | '[' | ']' |\ *(('.') | ('،') | '«' | '»' ;

UNICODE :[\u0020-\u002A\u002C-\u002C\u002F-\u003A\u003D-\u003D\u003F-\ \u005A\u005C-\u005C\u005F-\u007B\u007D-\uFFFD] ;

LETTER : [a-zA-Z] ;

NUMBER : [0-9] ;

NEWLINE : ( '\n' ) ; Appendix A - Shows the ANTLR grammar used to generate a parser capable of parsing yUML formatted text.

39

Appendix B

API Application Program Interface UML Unified Modeling Language OGDF Open Graph Drawing Framework DOM Document Object Model SAX Simple API for XML XML eXtensible Modeling Language HTML HyperText Markup Language ANTLR ANother Tool for Language Recognition SVG Scalable Vector Graphics DGML Directed Graph Markup Language Appendix B - Table of all acronyms and what they stand for. In order of their appearance.

40

References

[1] E. Arisholm, L. Briand, S. E. Hove, and Y. Labiche, “The impact of UML

documentation on software maintenance: an experimental evaluation,” IEEE

Transactions on Software Engineering, vol. 32, no. 6, pp. 365–381, Jul. 2006.

[2] L. Briand, Y. Labiche, M. D. Penta, and H. (Daphne) Yan-Bondoc, “An

experimental investigation of formality in UML-based development,” IEEE

Transactions on Software Engineering, vol. 31, no. 10, pp. 833–849, Nov. 2005.

[3] M. Decker, K. Swartz, M. Collard, and J. Maletic, “A Tool for Efficiently Reverse

Engineering Accurate UML Class Diagrams,” in 32nd IEEE International

Conference on Software Maintenance and Evolution, Raleigh Durham, NC, 2016.

[4] “yUML,” yUML, 2017. [Online]. Available: https://yuml.me/.

[5] “Open Graph Drawing Framework Documentation,” Open Graph Drawing

Framework. [Online]. Available: https://ogdf.uos.de/doc/.

[6] A. Sutton and J. Maletic, “Recovering UML Class Models from C++: A Detailed

Explanation,” Information and Software Technology, vol. 49, no. 3, pp. 212–219,

2007.

[7] D. Sun and K. Wong, “On evaluating the layout of UML class diagrams for

program comprehension,” presented at the 13th International Workshop on Program

Comprehension, St. Louis, MO, USA.

41

[8] S. Yusuf, H. Kagdi, and J. I. Maletic, “Assessing the Comprehension of UML

Diagrams via Eye Tracking,” presented at the 15th IEEE International Conference

on Program Comprehension (ICPC’07), 2007, pp. 113–122.

[9] B. Sharif and J. Maletic, “The Effect of Layout on the Comprehension of UML

Class Diagrams: A Controlled Experiment,” presented at the 2009 5th IEEE

International Workshop on Visualizing Software for Understanding and Analysis,

Edmonton, 2009, pp. 11–18.

[10] B. Sharif and J. I. Maletic, “An Empirical Study on the Comprehension of

Stereotyped UML Class Diagram Layouts,” presented at the 17th IEEE

International Conference on Program Comprehension (ICPC’09), 2009, pp. 268–

272.

[11] O. Andriyevska, N. Dragan, B. Simoes, and J. I. Maletic, “Evaluating UML Class

Diagram Layout based on Architectural Importance,” presented at the 3rd IEEE

International Workshop on Visualizing Software for Understanding and Analysis

(VISSOFT’05), 2005, pp. 14–20.

[12] E. Korshunova, M. Petkovic, M. G. L. van den Brand, and M. R. Mousavi,

“CPP2XMI: Reverse Engineering of UML Class, Sequence, and Activity Diagrams

from C++ Source Code,” presented at the 2006 13th Working Conference on

Reverse Engineering, Benevento, Italy, 2006.

[13] L. A. Barowski and J. H. Cross, “Extraction and Use of Class Dependency

Information in Java,” presented at the Ninth Working Conference on Reverse

Engineering (WCRE’02), 2002, pp. 309–318.

42

[14] A. Sutton and J. I. Maletic, “Mappings for Accurately Reverse Engineering UML

Class Models from C++,” presented at the 12th Working Conference on Reverse

Engineering (WCRE ’ 05 ), 2005, pp. 175–184.

[15] “Extensible Markup Language (XML),” W3C, 11-Oct-2016. [Online]. Available:

https://www.w3.org/XML/. [Accessed: 15-Sep-2019].

[16] M. Collard, M. Decker, and J. Maletic, srcML. srcML LLC.

[17] M. Collard, M. Decker, and J. Maletic, “srcML: An Infrastructure for the

Exploration, Analysis, and Manipulation of Source Code,” in 29th IEEE

International Conference on Software Maintenance, Eindhoven, The Netherlands,

2013.

[18] J. Ellson, E. R. Ganser, E. Koutsofios, S. C. North, and G. Woodhull, “Graphviz and

dynagraph – static and dynamic graph drawing tools.” Springer Verlag, 2003.

[19] “Class Diagram Samples,” yUML. [Online]. Available:

https://yuml.me/diagram/scruffy/class/samples.

[20] KDE, “Calligra,” Calligra. [Online]. Available: https://www.calligra.org/.

[21] T. Parr, “ANTLR,” ANTLR, 2014. [Online]. Available: https://www.antlr.org/.

[Accessed: 25-Jul-2019].

[22] “SVG: Scalable Vector Graphics,” developer.mozilla.org, 25-Jul-2019. [Online].

Available: https://developer.mozilla.org/en-US/docs/Web/SVG.

[23] “DGML,” Wikipedia, 16-Jul-2018. [Online]. Available:

https://en.wikipedia.org/wiki/DGML.

43

[24] “Directed Graph Markup Language (DGML) reference,” Microsoft, 03-Nov-2016.

[Online]. Available: https://docs.microsoft.com/en-

us/visualstudio/modeling/directed-graph-markup-language-dgml-

reference?view=vs-2019.

[25] M. Chimani, C. Gutwenger, M. Junger, W. Klau, K. Klein, and P. Mutzel,

“Handbook of Graph Drawing and Visualization.” CRC Press, 2014.

[26] K. Sugiyama, S. Tagawa, and M. Toda, “Methods for Visual Understanding of

Hierarchical System Structures,” IEEE Transactions on Systems, Man, and

Cybernetics, vol. 11, no. 2, pp. 109–125, Feb. 1981.

[27] “Layered graph drawing,” Wikipedia. [Online]. Available:

https://en.wikipedia.org/wiki/Layered_graph_drawing.

[28] I. Jacobson, G. Booch, and J. Rumbaugh, The Unified Software Development

Process, First. Addison-Wesley Professional, 1999.

[29] N. Dragan, M. L. Collard, and J. I. Maletic, “Automatic Identification of Class

Stereotypes,” presented at the IEEE International Conference on Software

Maintenance (ICSM’10), 2010, pp. 1–10.