XMI and DOT exercises

XML Interchange (XMI) The XML Metadata Interchange (XMI) is a XML based language that allows interchange of model metadata, in particular interchange of UML models. XMI is an (OMG) standard, and the "OMG XML Metadata Interchange (XMI) Specification version 1.2" on which the XMI SDF Syntax in this exercise is based is available from the OMG web site: http://www.omg.org The open source UML modeling tool ArgoUML version 0.24 can be used to create an UML model and export it in the XMI format. Although not necessary to complete this exercise, the ArgoUML application could be used to read and create XMI files for testing. The ArgoUML web site is: http://argouml.tigris.org/ In this exercise only class diagrams with classes (no interfaces) and generalizations (parent/child relations) are used. Each class can have at most 1 super class. Fields and methods can be present in the class diagrams but are not used in this exercise. Graphviz Graph Visualization Software (Graphviz) is a set of tools that allows visualization of graphs described in the DOT Language. Although not necessary to complete this exercise it is encouraged to install the Graphviz open source tools to visualize the generated DOT files in Exercise 3. The DOT Language Syntax and the Graphviz tools are available from the Graphviz web site: http://www.graphviz.org Preparation Extract the xmi-dot.tar archive in an empty work directory and start the Meta Environment from this work directory. Load the module Exercise.sdf. Observe the import-graph. The XMI SDF Syntax is represented by the module Document and the DOT SDF Syntax is represented by the module Dot. The module Transform contains functions that allow conversions between XMI and DOT sorts. Unless indicated to do so, do not make changes to the XMI syntax, DOT syntax or Transform modules. Submit Submit the modified Exercise.sdf, Exercise.asf and Dot,sdf file before June, 20th via Peach.

EXERCISE 1 The purpose of exercise 1 is to measure the complexity of a "class diagram" model specified in the XMI language. A number of the functions created in this exercise are also used in exercise 3. The complexity is defined as: complexity = number of classes + number of relations + 2 * maximum length of relations If class B extends class A and class C extends class B, then the number of classes is 3 (A, B and C). The number of relations is 2 (B extends A and C extends B). The maximum length of relations is 2 (C extends B extends A).

To calculate the maximum length of relations the relations taken from the XMI file will be stored in a tree. The longest branch is determined from the tree. The length of the longest branch is the maximum length of relations. The following steps are taken in this exercise to create the tree and to determine the maximum length of relations. Step 1: Parse the XMI file and generate a list of all relations. (This is done in Exercise 1b.) Step 2: Order the elements in the list generated in step 1. This forces the first element to be the root element and guarantees that all the other elements in the list can be added to the tree in the order they appear in the list. (This is done in Exercise 1c.) Step 3: Create the tree from the ordered list of all relations. (This is done in Exercise 1c.) Step 4: Determine the length of the longest branch in the tree. (This is done in Exercise 1d.) Exercise 1a In the XMI syntax each class is referenced by a unique XMI-Id. A class is specified by the element with name UML:Class. The unique XMI-Id is set by the attribute with name xmi.id. See the following XMI fragment: The function getClassId returns a list of all class XMI-Id's.

 Complete the getClassId function in the Exercise Equations. Test the equations by reducing the term file test01.trm, the result should match the file test01.out. The number of classes is the length of the List[[XMI-Id]] returned by this function.

Hint: The XMI model used in the term file test01.trm, test02.trm, test06.trm and test07.trm is represented by Figure 1.

Figure 1, Overview of the example XMI model.

Exercise 1b In the XMI syntax the relations between two classes are specified by the element with name UML:Generalization. This element contains two elements that represent the parent class and child class of the relation. See the following XMI fragment: The sort XMI-Value-Pair is specified in the Exercise Syntax and is used to store the child-class / parent- class relation. The function getRelations returns a list of class relations. Each pair represents a parent/child class relation.

 Step 1: Complete the getRelations function in the Exercise Equations. Test the equations by reducing the term file test02.trm, the result should match the file test02.out. The number of relations is the length of the Table[[XMI-Value-Pair]] returned by this function. Exercise 1c To calculate the maximum length of relations, the list of relations returned by the getRelations function is stored in a tree. From this tree the longest branch can be retrieved. The list of relations from which the tree is constructed must be sorted to enforce that the first element is a root element and to guarantee that all remaining elements can be added to the tree in the order they appear in the list. The function orderPairs sorts the list of relations in such a way that an XMI-Value-Pair list element with class X as parent, will only occur in the list after an element with class X as child.

 Step 2: Complete the orderPairs function in the Exercise Equations. The sort Tree is specified in the Exercise Syntax and is used to represents a tree node: -> Tree The first element of the pair contains the id of the tree element. The second element contains the list of sub classes of the tree element. This list is empty if there are no sub classes. To create a tree the following functions could be used: createTree: From the XMI-Value-Pair list, create the root tree element and call the addToTree function with as parameters the created tree and the XMI-Value-Pair list from which the root element has been removed. addToTree: For each element in the XMI-Value-Pair list, call the addToTree1 function with the element as parameter. addToTree1: Check if the XMI-Value-Pair element should be added to this tree element. If not, call the addToTree2 function to recursively call the addToTree1 function for all elements contained in the Tree list of this tree element. addToTree2: Call the addToTree1 function for all elements contained in the Tree list. Store the resulting tree objects in the list as they may have been changed. Return this list when the addToTree1 function has been called for all elements.

 Step 3: Complete the createTree function in the Exercise Equations. Use the suggested "auxiliary" functions: addToTree, addToTree1 and addToTree2. Or define your own function(s).

 Test the equations by reducing the term file test03.trm, the result should match the file test03.out. The relations are now stored in a tree.

Exercise 1d The tree can be parsed to find the longest branch. The following ASF condition can be used to match on a smaller-than condition: ... $Integer1 < $Integer2 == true, ... ====> ... = ... See also the Integers module for all supported conditions: ASF+SDF Library -> basic -> Integers.sdf. The function maxLength returns the length of the longest branch in the tree.

 Step 4: Complete the maxLength function in the Exercise Equations. Use the existing "auxiliary" function: maxLength1. Or define your own function(s).

 Test the equations by reducing the term file test04.trm, the result should match the file test04.out. The maximum length of relations is now available. Exercise 1e The complexity: complexity = number of classes + number of relations + 2 * maximum length of relations can now be calculated. The function complexity returns the complexity of an XMI file.

 Complete the complexity function in the Exercise Equations. Test the equations by reducing the term file test05.trm, the result should match the file test05.out.

In this exercise the Meta Environment was used to parse an XMI document, store this data in a new defined tree sort and perform some simple condition based rules EXERCISE 2 The purpose of exercise 2 is to correct an error in the DOT SDF Syntax. The DOT Language contains node statements that describe a specific node and edge statements that specify the relations between nodes. An example of a Dot file is given below. This example contains 4 node statements and 3 edge statements. 1 digraph "Test" 2 { 3 "identifier77C" [shape="record", label="A" ]; 4 "identifier77E" [shape="record", label="B" ]; 5 "identifier780" [shape="record", label="C" ]; 6 "identifier782" [shape="record", label="D" ]; 7 8 "identifier782" -> "identifier77E"; 9 "identifier77E" -> "identifier77C"; 10 "identifier780" -> "identifier77C"; 11 } The edge statements in this example contain only a single edge operation: B -> A; This is supported by the DOT SDF Syntax. The DOT Language also supports edge statements with multiple edge operations: C -> B -> A; This is not supported by the DOT SDF Syntax.  Study the DOT Language Syntax specification of the edge statement. The syntax is included on the last pages of the exercise.

 Update the dot/Dot.sdf module to allow multiple edge operations in a single edge statement.

 Test the updated SDF Syntax by loading the term file dotsyntax.trm, there should be no errors.

In this exercise an existing SDF syntax specification was updated based on an abstract grammar description. EXERCISE 3 The purpose of exercise 3 is to transform the "class diagram" model specified in the XMI language to a Graphviz file in the DOT language. By using the Graphviz tools, a visual representation (e.g. an image) can then be created. If the Graphviz tools are installed, the following command can be used to generate a PNG image from a Dot file: dot -Tpng -omyfile.png myfile.dot

Exercise 3a Each node in the Dot file will be referenced by the xmi.id attribute of the UML:Class element in the XMI file. For the actual name that is displayed in the node, the name attribute of the UML:Class element is used. The function getIdValueTable parses the XMI file and returns a Table[[XMI-Id, XMI-Value]] containing the xmi.id as key and the name as value.

 Complete the getIdValueTable function in the Exercise Equations. Test the equations by reducing the term file test06.trm, the result should match the file test06.out.

Exercise 3b The main function performs the transformation from the XMI file to a Dot file by: 1. Calling the getGraph function to generate a "skeleton" Graph. This function uses the function getModelName to retrieve the model name from the XMI file. 2. Calling the addNodeStatements function to add the node statements to the Graph. This function uses the function getIdValueTable to retrieve the node id and node names from the XMI file. 3. Calling the addEdgeStatements to add the edge statements. This function uses the function getRelations to retrieve the relation information from the XMI file. The module Transform contains functions that allow transformation from XMI sorts to DOT sorts. This can be seen in the getGraph function.

 Complete the addNodeStatements function declaration in the Exercise Syntax. Complete the addNodeStatements function in the Exercise Equations.

 Complete the addEdgeStatements function declaration in the Exercise Syntax. Complete the addEdgeStatements function in the Exercise Equations.

 Test the equations by reducing the term files test07.trm and test08.trm. The result should match the files test07.out and test08.out. The resulting files after reducing the term files test07.trm and test08.trm are valid Dot files and can be used to generate a graph using the Dot tools.

In this exercise the Meta Environment was used to transform a document from the XMI Language into the DOT Language. http://www.graphviz.org/doc/info/lang.html

The DOT Language

The following is an abstract grammar defining the DOT language. Terminals are shown in bold font and nonterminals in italics. Literal characters are given in single quotes. Parentheses ( and ) indicate grouping when needed. Square brackets [ and ] enclose optional items. Vertical bars | separate alternatives. graph : [ strict ] (graph | digraph) [ ID ] '{' stmt_list '}' stmt_list : [ stmt [ ';' ] [ stmt_list ] ] stmt : node_stmt | edge_stmt | attr_stmt | ID '=' ID | subgraph attr_stmt : (graph | node | edge) attr_list attr_list : '[' [ a_list ] ']' [ attr_list ] a_list : ID [ '=' ID ] [ ',' ] [ a_list ] edge_stmt : (node_id | subgraph) edgeRHS [ attr_list ] edgeRHS : edgeop (node_id | subgraph) [ edgeRHS ] node_stmt : node_id [ attr_list ] node_id : ID [ port ] port : ':' ID [ ':' compass_pt ] | ':' compass_pt subgraph : [ subgraph [ ID ] ] '{' stmt_list '}' | subgraph ID compass_pt : (n | ne | e | se | s | sw | w | nw) The keywords node, edge, graph, digraph, subgraph, and strict are case-independent. Note also that the allowed compass point values are not keywords, so these strings can be used elsewhere as ordinary identifiers. An ID is one of the following:  Any string of alphabetic characters, underscores or digits, not beginning with a digit;  a number [-]?(.[0-9]+ | [0-9]+(.[0-9]*)? );  any double-quoted string ("...") possibly containing escaped quotes (\");  an HTML string (<...>). Note that in HTML strings, angle brackets must occur in matched pairs, and unescaped newlines are allowed. In addition, the content must be legal XML, so that the special XML escape sequences for ", &, <, and > may be necessary in order to embed these characters in attribute values or raw text. Both quoted strings and HTML strings are scanned as a unit, so any embedded comments will be treated as part of the strings. An edgeop is -> in directed graphs and -- in undirected graphs. An a_list clause of the form ID is equivalent to ID=true. The language supports C++-style comments: /* */ and //. In addition, a line beginning with a '#' character is considered a line output from a C preprocessor (e.g., # 34 to indicate line 34 ) and discarded. Semicolons aid readability but are not required except in the rare case that a named subgraph with no body immediately preceeds an anonymous subgraph, since the precedence rules cause this sequence to be parsed as a subgraph with a heading and a body. As another aid for readability, dot allows single logical lines to span multiple physical lines using the standard C convention of a backslash immediately preceding a newline character. In addition, double-quoted strings can be concatenated using a '+' operator. As HTML strings can contain newline characters, they do not support the concatenation operator. Semantic Notes If a default attribute is defined using a node, edge, or graph statement, or by an attribute assignment not attached to a node or edge, any object of the appropriate type defined afterwards will inherit this attribute value. This holds until the default attribute is set to a new value, from which point the new value is used. Objects defined before a default attribute is set will have an empty string value attached to the attribute once the default attribute definition is made. Note, in particular, that a subgraph receives the attribute settings of its parent graph at the time of its definition. This can be useful; for example, one can assign a font to the root graph and all subgraphs will also use the font. For some attributes, however, this property is undesirable. If one attaches a label to the root graph, it is probably not the desired effect to have the label used by all subgraphs. Rather than listing the graph attribute at the top of the graph, and the resetting the attribute as needed in the subgraphs, one can simple defer the attribute definition if the graph until the appropriate subgraphs have been defined. Character encodings The DOT language assumes at least the ascii character set. Quoted strings, both ordinary and HTML-like, may contain non-ascii characters. In most cases, these strings are uninterpreted: they simply serve as unique identifiers or values passed through untouched. Labels, however, are meant to be displayed, which requires that the software be able to compute the size of the text and determine the appropriate glyphs. For this, it needs to know what character encoding is used. By default, DOT assumes the UTF-8 character encoding. It also accepts the Latin1 (ISO-8859-1) character set, assuming the input graph uses the charset attribute to specify this. For graphs using other character sets, there are usually programs, such as iconv, which will translate from one character set to another. Another way to avoid non-ascii characters in labels is to use HTML entities for special characters. During label evaluation, these entities are translated into the underlying character. This table shows the supported entities, with their Unicode value, a typical glyph, and the HTML entity name. Thus, to include a lower-case Greek beta into a string, one can use the ascii sequence β. In general, one should only use entities that are allowed in the output character set, and for which there is a glyph in the font.