<<

TAGDUR: A Tool for Producing UML Sequence, Deployment, and Component Through Reengineering of Legacy Systems

Richard Millham, Jianjun Pu, Hongji Yang De Montfort University, England [email protected] & [email protected]

Abstract: A further introduction of TAGDUR, a documents this transformed system through a reengineering tool that first transforms a series of UML diagrams. This paper focuses on procedural legacy system into an object-oriented, TAGDUR’s generation of sequence, deployment, -driven system and then models and and diagrams.

Keywords: UML (Unified Modeling Language), Reengineering, WSL

This paper is a second installment in a series [4] accommodate a multi-tiered, Web-based that introduces TAGDUR (Transformation and platform. In order to accommodate this Automatic Generation of Documentation in remodeled platform, the original sequential- UML through Reengineering). TAGDUR is a driven, procedurally structured legacy system reengineering tool that transforms a legacy must be transformed to an object-oriented, event- system’s outmoded architecture to a more driven system. Object orientation, because it modern one and then represents this transformed encapsulates variables and procedures into system through a series of UML (Unified modules, is well suited to this new Web Modeling Language) diagrams in order to architecture where pieces of software must be overcome a legacy system’s frequent lack of encapsulated into component modules with documentation. The architectural transformation clearly defined interfaces. A transfer to a Web- is from the legacy system’s original based architecture requires a real-time, event- procedurally-structured to an object-oriented, driven response rather than the legacy system’s event-driven architecture. Once this original procedural invocation. transformation is complete, TAGDUR documents the structure and behavior of the Object orientation offers additional advantages. transformed system through a series of UML Object-oriented systems have lower maintenance diagrams including class, , deployment, costs than procedural software. Because object sequence, and component diagrams. oriented systems have encapsulated software modules, modules from different systems can This paper gives a brief overview of the more easily integrated than software that is problems posed by many legacy systems, a procedurally structured with global variables. general description of TAGDUR’s design, and how TAGDUR generates three types of UML Often, the high cost of development and diagrams: sequence, deployment, and component. switchover costs precludes developing replacement systems for these legacy systems. 1.1 Overview of Problem However, in order to integrate disparate systems, Early computer systems, faced with severe developers must fully understand the systems memory constraints, were designed to be both that are being integrated, whether these systems procedurally structured and driven. have been reengineered or not. Developers must . understand the structure of the system, the data In many cases, these systems were designed as and control flows, and the execution of external standalone systems. As time passed and business events. Documentation containing this needs changed, a need arose to integrate these information is often missing or obsolete. disparate systems together and remodel them to Consequently, any reengineering process must incorporate some capability within it to generate information obtained during the transformation documentation pertaining to the structure and process and by parsing the code of the dynamics of this system. transformed system, our tool is able to utilize this acquired information in its generation of Our tool, TAGDUR, was designed with the UML diagrams of the transformed system. purpose of trying to address the problem of lack of system documentation. By utilizing 2. Tool Design

COBOL COBOL to Ex: variable data types source code COBOL source code WSL COBOL-specific program information conversion

W SL representation of COBOL program WSL-specific program information

WSL program transformation and analysis

Ex: classes, events, messages Generation of UML between classes representation of Results of WSL analysis C++ code Data typing of variables, etc transformed WSL Results of transformation and analysis generation system C++ code UML diagrams

C++ program equivalent to UML COBOL representation program

Overview of TAGDUR Tool Design

transformation step is to evaluate tasks, using The COBOL legacy system is converted into several different algorithms as outlined in [3] and WSL using a set of COBOL to WSL conversion at procedural, program block, and individual rules that were developed using, for its basis, code line level of granularity, for their degree of Martin Ward’s [11] paper, The Syntax and task independence. A task is defined as an Semantics of the Wide Spectrum Language, atomic unit of work; tasks can be defined at which defined the basis of the Wide Spectrum several different levels of granularity such as Language. Because WSL lacks a data typing program block, procedure, and individual code capability, programming language-specific line. Our task evaluation technique consists of information, such as variable data types of the analyzing two normally sequential for the original legacy system, can not be represented in presence of data and control dependencies WSL but, instead, are stored in the database for between them. If a exists, these tasks future use, such as during the C++ code are deemed to be sequential; if no dependency generation phase. exists, these tasks are deemed to be able to execute independently. The last transformation Once this COBOL to WSL conversion phase has step is to identify possible events from source been completed, TAGDUR, via a series of code. The event identification step involves transformation steps, transforms the original constructing a control graph of procedure calls, procedurally structured and sequential legacy I/O calls, system interrupts, and error invocations system to an object-oriented, event-driven by parsing source code. Nodes of this control system. The first of these steps is to use our graph which involve interactions with other clustering technique to identify objects along objects are modeled as events. An example, with their attributes and operations. Our raising an error is modeled as an event technique groups closely coupled procedures and occurrence between the object where the error variables into classes. [2] The second occurred and the object, usually a System object, which handles the error. Once the events are class diagrams, which describe both the static identified, each event is evaluated in terms of its structure and behaviour of the system in terms synchronicity. Synchronicity is determined by that are most useful to developers. UML is a evaluating, at the individual code line level of world-accepted modelling standard with granularity, whether the task where the event significant tool support. UML is also platform occurred and the task immediately successive to and programming language independent. this task share any control or data dependencies. Although generation is not a If a dependency exists, the event is deemed to be present feature of UML, TAGDUR provides synchronous; otherwise, if no dependency exists, class and activity diagrams. the event is deemed to be asynchronous. WSL has many advantages which make it ideal Once the transformation process completes, as an intermediate language. WSL was designed TAGDUR documents this transformed system to be easy to analyse and transform. WSL is by representing it through a series of UML supported by several tools, including the Fermat diagrams. These UML diagrams are represented transformation system which provides automatic in UXF textual format. UXF (Uml eXchange transformation and code simplification. WSL is Format) is a XML-based model interchange for programming and platform independent. As a UML models developed by Junichi Suzuki and result, the transformations and modeling that Yoshikazu Yamamoto. [8] Diagrams represented TAGDUR performs on a WSL-represented in UXF can be imported into a UXF-compatible system can be accomplished regardless of graphical tool for viewing. whether the original legacy system was in COBOL or C. The original legacy systems need After the transformation process finishes, the only to be converted into WSL first. [10] WSL intermediate representation is restructured into classes. Each variable that is associated with Fermat provides the ability to convert source a class is modeled as an attribute of the class code of other languages, such as IBM 370 while each procedure that is associated with a assembly language, into WSL and then convert class is modeled as an operation of the class. this WSL code into other programming languages such as C or COBOL.[9] TAGDUR Information, which was obtained during the can utilize Fermat’s language conversion transformation process, is used in the production features by having Fermat translate assembly of UML diagrams. An example, classes that have language into WSL and then transform this WSL been identified during the class identification representation into an object-oriented process become classes in the . The architecture which then can be translated into independent task evaluation process determines C++. the sequencing order of tasks; each task is given a sequence number, in ascending order of Finding system artefacts, on which to base the execution. If two or more tasks may execute in modelling of UML diagrams of the system is parallel, these tasks are assigned the same difficult for many legacy systems. sequence number. The sequence number of the Documentation of the system often does not task enclosing an event, such as a file I/O exist. The original developers and end-users, operation, is modelled as the sequence number of who are often an important source of information the message in the . Events that about the system, have long since left the are identified during the event identification organization. Because these systems often have process are portrayed as messages between been left in light maintenance mode for many objects in the sequence diagram. years, current maintainers and end-users have minimal knowledge of the system. Because the 2.2 Rationale behind Tool Design Decisions source code, along with the associated data files, are only available system artefacts, any The UML modeling notation was selected as the generation of UML diagrams, which are used to method to model the transformed reason for model the system, must be based primarily on several reasons. UML, through its series of source code. diagrams, provides several different perspectives of the system. UML contains use case diagrams, 2.4 Advantages of TAGDUR which model business processes from the end- user perspective, but also contains activity and Rumbaugh et al identifies three viewpoints necessary for understanding a software systems: Although several existing tools provide one or the objects manipulated by the computation (the more of these views, no tool provides all four of data model), the manipulations themselves these views. One existing tool, RIGI, can (functional model), and how the manipulations transform a system and then generate visual are organized and synchronized (the dynamic documentation of its static structure. However, model). [6] We wish to propose another view RIGI does not have the capability of required for system understanding, the documenting the dynamic, behavioural, or architectural view, which represents the physical architectural view of the system. RIGI was components of the system and the relationships developed by Hausi Muller and his team at the among them. University of Victoria, Canada. [7]

TAGDUR addresses all four of these views 3. Generation of UML Diagrams through its generation of UML diagrams: the static, behavioral, dynamic, and architectural TAGDUR generates several types of UML views through the class, activity, sequence, and diagrams of the transformed system, including deployment or component diagrams respectively. class, sequence, deployment, component, and The first paper in this series focused on the activity diagrams. This paper focuses on the generation of UML class and activity diagrams, generation of sequence, deployment, and which represent the static and behavioral view of component diagrams, representing the dynamic the system. [6] This paper focuses on and architectural views of the system representing the dynamic and architectural views respectively from the information gained during through UML diagrams. TAGDUR’s transformation process of the system. Sequence diagrams, which represent the dynamic view, are useful in depicting the messages passed 3.1 Sequence Diagrams between interacting objects. The developer needs to understand the interactions among objects as Sequence diagrams are diagrams that depict the depicted in a sequence diagram in order to interactions, via message passing, among objects properly design the interfaces of these objects. If in a temporal context. The sequence diagram is the developers plan to distribute objects among divided up into vertical sections, called several different platforms, such as in a multi- swimlanes. Each object in the system is assigned tiered Web architecture, the developer would their own swimlane. Messages are depicted as need to understand the object interactions in being sent from the swimlane of their sending, or order to group frequently-interacting objects on source, object to their receiving, or target, object. the same platform in order to minimize communication costs. Procedure calls are modeled as messages between the caller object that invokes the The architectural view is required in order to procedure and the called object that contains the understand how the system is related to external procedure being invoked. Exceptions and physical entities, such as a printer, and the interrupts are modeled as messages between the dynamic configuration of program files. This object where the interrupt/exception occurred architectural view is need if a physical entity, and the System object, representing the such as a database file, changes and the underlying operating system, which handles the developer needs to quickly ascertain which error or exception. File input/output calls, such program files access this database file in order to as statements that read or write data from files, change the relevant code within them. In a large are modeled as messages between the object legacy system, such as the particular invoking the input-output method and the File telecommunications legacy system used in our object, which represents a generic database file. study, there are 106 source code files. It is necessary to depict the relationships among Conditions within WSL control constructs, such source code files, as in which source code files as WSL’s if-then statements, that enclose code load which other source code files as libraries, in that invoke a message form conditions within the a because the number of guards that govern the passing of messages from potentially loadable libraries is often too one object to another. An example, given the numerous to keep track of manually. WSL statement, IF W<3 THEN Call UpdateVal FI, the condition [W<3] forms a turn, for the presence of keyword that indicate a guard to the method invocation message relationship between the source code file being UpdateVal. If the WSL control construct that parsed and the physical entity/software encloses code invoking a message component identified as the keyword. An example, during parsing source code file SCF1, These messages may be modeled as synchronous the WSL statement “Put Sys01, X” is or asynchronous. A message is deemed to be encountered. This WSL statement outputs the asynchronous if the task that is immediately value of variable X to the physical data file successive to the task invoking the message may denoted by Sys01. Consequently, this parsing execute independently. If the immediately determines that a relationship between source successive task can not execute independently, code file SCF1 and physical file SYS01 exists the message is deemed to be asynchronous. By and hence, the models this evaluating tasks containing message invocations, relationship between the two physical nodes of the independent task evaluation process SYS01 and SCF1. In a similar way, if source determines which tasks may execute code file SCF2 contains a WSL statement that independently from one another and, loads a library file, LF2, this loading is depicted consequently, determines which messages in the component diagram as a dependency contained in the tasks being evaluated are relationship between two software components deemed to be synchronous or asynchronous. SCF2 and LF2.

The independent task evaluation process is responsible for determining the sequence SCF2 LF2 numbers assigned to each message in the sequence diagram. The independent task evaluation process assigns a sequence number to Component Diagram Showing Compilation tasks at both the procedural and individual Dependency between library file, LF2, codeline level of granularity. and source code file, SCF2

Each message depicted in the sequence diagram Sys01 is given a sequence number in the number: SCF1 .. The procedure task sequence number is the sequence number of Deployment Diagram Showing the Interface the procedure where the message is invoked and relationship between source code file, individual codeline task sequence is the SCF1, and file device, Sys01 sequence of the codeline where the message is invoked. The sequence indicates the order of A small sample of WSL code is presented with a execution in ascending order; messages with the corresponding UML sequence diagram based on same sequence number may be executed in this code. parallel. WSL Code Sample: 3.2 Component and Deployment Diagrams Main() /* main calling program */ Begin Component diagrams depict the run-time Call A.CreateMessages() relationships among software components of a End. system. These components may be simple files Class A or libraries loaded dynamically. The Begin relationships among software components are Var X usually compilation dependencies. [5] Var Z Proc CreateMessages() Begin Deployment diagrams depict the physical If System.System_Error = ‘Error’ Then arrangement of various hardware components Call System.System_ErrorHandler() and executable files of the system. [5] Else Call B.UpdateVal(X,Z) Put File, X, Z Both the component and deployment diagrams Fi are derived by parsing each source code file, in End. End object A (the object containing the method invocation) and the receiving object B (the object Class B Begin that handles the message). Potentially parallel- Proc UpdateVal(X,Z) executing message tasks, such as method Begin invocations of System_ErrorHandler and X := 8 UpdateVal, are given the same sequence number. Z := 1 End. Message tasks that must execute in sequence, End such as UpdateVal(X,Z) and Put File, X,Z, are given different sequence numbers. Conditions, Class System such as NOT[System.System_Error = ‘Error’], Begin Var System_Error which enclose code that invoke a message, such Proc System_ErrorHandler() as Call UpdateVal(X,Z), form guards to the Begin message, UpdateVal(X,Z). /* handles errors */ End. Class A Class B System File End 1.0 [System.SystemError ='Error]/System.System_ErrorHandler() () Class File

Begin 1.0() Not [System.SystemError ='Error]/B.UpdateVal(X,Z) /* handles File I/O */ End 1.2 Not[System() .SystemError ='Error]/Put(File,X,Z)

The following diagram models the WSL code sample as a sequence diagram. Each object is Sequence Diagram Representation of the given its own swimlane. Messages, such as the WSL Code Sample method invocation Call B.UpdateVal(X,Z), is modeled as a message between the sending References

4. Conclusion 4) Millham, Richard “TAGDUR: A Tool for Producing UML Diagrams Through Reengineering of Legacy TAGDUR is a reengineering tool that was Systems” Proceedings of the 7th IASTED designed to address two of the most prominent International Conference on Software Engineering and Applications, Marina del la Rey, USA, 2003 problems of legacy systems: obsolete 5) Muller, Pierre-Alain Instant UML Wrox: architecture and a lack of documentation. Birmingham, UK, 2000 TAGDUR first transforms a legacy procedural 6) Rumbaugh, James, Michael Blaha, William architecture to an object oriented one and then Premerlani, Frederick Eddy, William Lorenson generates documentation of this transformed Object-Oriented Modeling and Design, Prentice- system through a series of UML diagrams. This Hall, 1991 documentation allows the developer to fully 7) Storey, Margaret-Anne D., Hausi A. Müller, understand the system and enables them to Kenny Wong “Manipulating And Documenting modify the transformed system, now available as Software Structures “, ICSM '95 (Nice, France, October 16-20, 1995) a C++ program automatically generated by 8) Suzuki, Junichi and Yoshikazu Yamamoto TAGDUR, to meet new business needs such as “Making UML models interoperable with UXF”. system integration Lecture Notes in Computer Science 1618, Reference Springer-Verlag, Heidelberg, German 1) Alhir, Sinan Si UML in a Nutshell 9) Ward, M. “The FermaT Assember Re-Engineering O’Reilly:Sebastapol, CA, USA, 1998 Workbench”, ICSM ‘(Florence, Italy, 2001) 2) Millham, Richard “An Investigation: 10) Ward, Martin “Specifications from Source Code -- Reengineering Sequential Procedure-Driven Alchemists' Dream or Practical Reality?” 4th Software into Object-Oriented Event-Driven Reengineering Forum, September 19-21, 1994, Software through UML Diagrams”. Published in the Victoria, Canada Proceedings of the International Computer Software 11) Ward, M., “The Syntax and Semantics of the and Applications Conference, Oxford, 2002 Wide Spectrum Language”, Technical Report, 3) Millham, Richard “Determining Granularity of Durham University, England, 1992. Independent Tasks for Reengineering a Legacy System into an OO System” To be published in the

Proceedings of the International Computer Software and Applications Conference, Dallas, Texas, 2003