7th WSEAS Int. Conf. on SOFTWARE ENGINEERING, PARALLEL and DISTRIBUTED SYSTEMS (SEPADS '08), University of Cambridge, UK, Feb 20-22, 2008

Database Reverse Engineering Tools

NATASH ALI MIAN TAUQEER HUSSAIN Faculty of Information Technology University of Central Punjab, Lahore, Pakistan

Abstract: - Almost every information system processes and manages huge data stored in a system. For improvement and maintenance of legacy systems, the conceptual models of such systems are required. With the growing complexity of information systems and the information processing requirements, the need for database reverse engineering is increasing every day. Different Computer Assisted Software Engineering (CASE) tools have been developed as research projects or as commercial products which reverse engineer a given database to its . This paper explores the capabilities of existing CASE tools and evaluates to which extent these tools can successfully reverse engineer a given database system. It also identifies the constructs of a conceptual model which are not yet supported by the database reverse engineering tools. This research provides motivation for software engineers to develop CASE tools which can reverse engineer the identified constructs and can therefore improve the productivity of reverse engineers.

Key-Words: - Database reverse engineering, CASE tools, Conceptual schema, ER model, EER model

1 Introduction (OMT), and Entity-Relationship (ER) Database Reverse Engineering (DBRE) is a model. ER model is widely used as a data process of extracting database requirements modeling tool for database applications due from an implemented system. Traditionally, to its ease of use and representation. An legacy systems suffer from poor design Enhance Entity-Relationship (EER) model is documentation and thus make the an extension of ER model which has maintenance job more difficult. The additional constructs such as specialization improvement and maintenance of software (or generalization), shared subclasses, union systems can be done with significantly less types (or categories). In literature [1-9], we effort and cost if the requirements or find many database reverse engineering conceptual models of such systems are (DBRE) techniques which can convert a readily available. The DBRE has a very given database or a to a important role in this regard and has been a high-level conceptual model. [10, 11] topic of research since 1980s. The steps provide a good survey and comparison of involved in database forward engineering these techniques whereas [12] discusses and reverse engineering are shown in Fig. 1. issues involved in DBRE as well as some of A set of database requirements or its the solutions. Keeping in view the conceptual model can be considered as an importance and need of DBRE for a wide input to forward engineering process variety of software applications, a number of whereas it is an output of the reverse CASE tools have been developed in research engineering process. There are different as prototypes whereas some are types of conceptual models used in commercially available. These tools differ in designing software applications, for what they need for input and what they instance, Unified Modeling Language produce. For example, some tools require (UML), Object Modeling Technique just relational schema whereas others need data instances as well. Some tools

ISSN: 1790-5117 206 ISBN: 978-960-6766-42-8 7th WSEAS Int. Conf. on SOFTWARE ENGINEERING, PARALLEL and DISTRIBUTED SYSTEMS (SEPADS '08), University of Cambridge, UK, Feb 20-22, 2008

additionally require application code, and/or It should be noted that the constructs of forms fields to start the DBRE process. an ER (or EER) model are used for Some tools assume that the database is comparison in this paper. The constructs in normalized up to BCNF whereas others can OMT can be easily mapped to those of the start even from 3NF. Likewise, for output, it EER model. Therefore, for ease of can be an ER model, EER model or an OMT reference, the constructs of ER model and model. the additional constructs of EER model are The objective of this research is to listed in Table 1 and Table 2 respectively. explore the capabilities of such tools and to This paper is organized in four sections – analyze how far they fulfill the requirements section 2 explores the capabilities of each of DBRE. For this purpose, we have studied tool studied, section 3 provides a eight different DBRE tools and have comparative analysis of their capabilities provided a comparison which elaborates and section 4 concludes this research and their strengths and weaknesses. It is provides pointers to future work that can be expected that this research can motivate the done in this area. developers to incorporate the missing features in their DBRE CASE tools so that the DBRE process can be made more 2 DBRE Tools efficient and accurate for the community. 2.1 Toad Modeler Toad modeler [19], developed by Requirements F Charonware, provides a solution for reverse R o engineering a relational database. It also e Transform r supports legacy . The tool can v to w extract entities, attributes, relationships, e Conceptual indexes, triggers, procedures and other Model a r r objects. However the output is dependent on s d the type of inputs. In addition, Toad Data e CONCEPTUAL MODEL Modeler supports different communication E methods like ODBC, ADO, direct native E n connections. It can generate HTML and n Transform g RTF documentation. The database schema g to i can be re-designed using Version Manager i Logical which is provided with the tool. Model n n e e e Table 1: Constructs of an ER Model e LOGICAL MODEL r Sr# Category r i 1. Regular entity type 2. Weak entity type i n 3. Simple attribute n Transform g g to 4. Composite attribute Physical 5. Key attribute Model 6. Multi-valued attribute 7. Relationship type 8. Structural constraints of a PHYSICAL MODEL relationship type 9. Relationship Attribute

Fig. 1: Process of Forward and Reverse Engineering 2.2 DB_MAIN Another tool developed by a research group at University of Namur, Belgium [14]

ISSN: 1790-5117 207 ISBN: 978-960-6766-42-8 7th WSEAS Int. Conf. on SOFTWARE ENGINEERING, PARALLEL and DISTRIBUTED SYSTEMS (SEPADS '08), University of Cambridge, UK, Feb 20-22, 2008

supports reverse engineering of a database. some extent reverse engineering. A database This tool is now commercially available. schema can be created and maintained using The tool is developed by a continuous ERWIN. However, for reverse engineering, research of more than 10 years. The group it requires database schema as well as has devised new models to carry the reverse program code. engineering process in an efficient way. The tool extracts design by analysis of database schema, code and data. 2.6 Varlet Analyst Varlet Analyst [13] requires application Table 2: Additional Constructs of an EER code, database schema and data as inputs. Model An important point to note is that the tool Sr# Construct can be customized with respect to the 1. Specialization application to be reverse engineered. A 2. Specialization Constraints limitation of the tool is that the process is 3. Shared Subclass not fully automated; some work has to be 4. Category (or Union Type) performed by the reverse engineer. Firstly,

automatic analysis operations are applied to

the system. This results in a set of facts 2.3 ER Creator about the legacy database. These facts are Another tool that supports database reverse taken as indicators which are combined with engineering is ER creator [20]. This tool is domain-specific heuristics. After performing developed by Model Creator. The tool gives certain transformations, a is support to reverse engineer an existing generated. However, this output is not system. However, it has certain limitations – complete yet. The manual phase is then (1) it supports only ER model, and (2) the initiated where application experts are connection can only be established using involved with the reverse engineer to get a ODBC data source. This restricts the tool to better quality conceptual schema. The tool a smaller number of database management automates the process partially and requires systems (DBMS). A trial version with human intervention to get a better quality limited functionality is also available for conceptual schema. academic and research purpose.

2.7 August II 2.4 INSIGHT August-II [16] supports inputs from Insight [15] is a specialized tool to reverse different database management systems. The engineer the existing system. Its input is inputs can be from COBOL record layouts database schema, code and data. From the to DB2 data dentitions. Basic inputs are input, it identifies library defined entities, database application code and data. The language defined entities, operating system input is then translated into a conceptual entities, and application defined entities with model. It generates an ER model with their relationships of each type. It also has entities, relationships, attributes, and the feature to inspect code which will help structural constraints in the form of us to understand the system in a better way. cardinalities and participation. The output However the tool does not extract produced by the tool can be used as an input cardinality, participation and the EER to a number of CASE tools and commercial specific constructs. DBMS. The tool performs the process of reverse engineering in a step by step manner

and allows the user to observe the output of 2.5 CA ERWin each step before further processing. CA ERWin is one of the most widely used However, it requires developer interaction, CASE tool for forward engineering and to as well.

ISSN: 1790-5117 208 ISBN: 978-960-6766-42-8 7th WSEAS Int. Conf. on SOFTWARE ENGINEERING, PARALLEL and DISTRIBUTED SYSTEMS (SEPADS '08), University of Cambridge, UK, Feb 20-22, 2008

relational schema as input and converts it into Java classes. SoftwareTree: JDX has the 2.8 ITOC capacity to take input from multiple sources ITOC [17] is a tool developed by the and reverse engineer it into logical schema. research group of Centre for Software Sybase: PowerDesigner takes the database Maintenance at the University of schema as input and generates code for Java. Queensland in collaboration with Oracle Thought Inc: CocoBase takes database Corporation. The objective of this project is schema as input and converts the given to develop tools to assist in the migration of physical model into an object oriented legacy systems to fourth generation model. GEBIT: TREND takes database language (4GL) information system schema and converts it into a conceptual applications. This tool has been tested on schema. Object Style: Cayenne Style is an several deployed commercial applications. open source tool which converts the input Inputs of the tool are relational database, relational schema to an object oriented application code and user interfaces (forms). schema. Outputs are Oracle CASE repository elements. The tool generates a conceptual schema which can be forward engineered to 3 Comparison of DBRE Tools a refined database. It also uses data mining For the purpose of this research, we have techniques while performing query analysis. classified the inputs required for a DBRE However, the tool cannot extract all the tool into database schema, code, data and constructs of an EER diagram. Also user forms. Database schema refers to relational knowledge is required at a certain stage. or non-relational schema and includes constraints. However, it does not include data instances. Code refers to programs used 2.9 Others to access the database and these can be in Nevertheless, there are other reverse procedural or non-procedural languages. engineering tools, too, which convert the Data refers to a database instance, and forms input databases into a conceptual model or a refer to the user interfaces required to programming language. These tools are capture data from user. Table 3 gives a briefly discussed in this sub-section. . A list comparison of the eight tools based upon the of these tools is available at types of inputs required for the tool. The http://www.javaskyline.com/database.html. capabilities of each DBRE tool are Ampersand Zero Code Reverse engineers summarized in Table 4. It shows which EER a relational database to a servlet application constructs a tool can reverse engineer. If web site. Code Futures: Firestorm takes available, the version of each tool used in (DDL) or JDBC metadata as input and this comparison is also given. generates pattern-based persistence tier. It can be noted that most of the tools DbGen converts input database to Java data require other inputs in addition to a database and test classes. Sadan: CMaker takes schema.

Table 3: Inputs Required by DBRE tools Tool Database Code Data Forms Toad Modeler Y N N N DB_MAIN Y Y Y N ER Creator Y N N N Insight Y Y Y N CA ERWIN Y Y N N Varlet Analyst Y Y Y N AUGUST II Y Y Y N ITOC Y Y N Y

ISSN: 1790-5117 209 ISBN: 978-960-6766-42-8 7th WSEAS Int. Conf. on SOFTWARE ENGINEERING, PARALLEL and DISTRIBUTED SYSTEMS (SEPADS '08), University of Cambridge, UK, Feb 20-22, 2008

Table 4: EER Constructs Reverse Engineered by a Tool Toad DB_ ER INSIGHT CA VARLET AUGUST ITOC Constructs Modeler MAIN Creator ERWIN Analyst II V3 V8 V3.2 V7.2 V3 V1

Entities Y Y Y Y Y Y Y Y Relationships Y Y Y Y Y Y Y Y Inheritance N Y N N N N N N Attributes Y Y Y Y Y Y Y Y Cardinality N Y Y N N Y Y N Key Attribute Y Y Y Y Y Y Y Y Participation N N N N N N Y N Multi Valued N N N N N N N N Attribute Union Type N N N N N N N N

This can be taken as a limitation of these in Table 4. It has also been observed that tools because not every application is every tool can reverse engineer basic designed with forms, for instance. There are constructs of an ER model namely entities two tools which can do the task with and relationships amongst these entities, but ; Toad modeler and ER the remaining ER/EER constructs are not creator, Toad modeler gives better results, reverse engineered by many of the CASE still all constructs are not extracted, also the tools available at present. Even an important results produced by ER creator do not construct like inheritance is not supported by supports any of the constructs of EER any of these CASE tools except DB_MAIN. model. The findings given in Table 4 It is therefore concluded that a DBRE CASE indicate that no CASE tool supports all EER tool should be developed that can provide constructs which seems to be a serious support for reverse engineering of the limitation of existing DBRE tools. following EER constructs – inheritance, Inheritance is supported only by DB_MAIN. participation constraint on relationships, Multi-valued attributes, participation multi-valued attributes, and union type. constraint and union type are not yet reverse Such a tool can increase the productivity of engineered by any CASE tool. This provides a reverse engineer resulting into decreased the motivation for software engineers to maintenance costs of software systems. develop a DBRE CASE tool which can Further work should also be done to support all these features. improve the existing DBRE algorithms in order to reduce the number of inputs required by the CASE tools for database 5 Conclusion and Future Work reverse engineering. The extent of human intervention required by the tools is another In this paper, eight DBRE CASE tools have factor which should be studied. Research been analyzed and compared in terms of should be done to minimize or to eliminate their capabilities to reverse engineer a given this human intervention, if possible. database system. It is observed that inputs to

these CASE tools can be database schema,

application code, database instances and References: user interface forms. Most of the tools [1] M. Anderson. Extracting an ER Schema require more than one type of input. Only from a R.elational Database Through ER Creator requires just the database Reverse Engineering. In Proc. of the schema and no other input. However, it then th 13 Int’l . Conf. on the ER Approrich, has limited capabilities as it cannot reverse LNCS 81:403-419, Dec. 1994. Springer- engineer many of the EER constructs noted Verlag.

ISSN: 1790-5117 210 ISBN: 978-960-6766-42-8 7th WSEAS Int. Conf. on SOFTWARE ENGINEERING, PARALLEL and DISTRIBUTED SYSTEMS (SEPADS '08), University of Cambridge, UK, Feb 20-22, 2008

[2] M.A. Casanova and J.E.A. de Sa. methods for relational databases. Proc. Designing Entity-Relationship Schemes 3rd European Conference on Software for Conventional Information Systems. Maintenance and Reverse Engineering In Proc. of the 3rd Int’l. Conf. on the ER (CSMR), 1999. Approach to Software Engineering, pp [11] K. H. Davis, P. H. Alken. Data reverse 265-277, California, 1983. Elsevier engineering: a historical survey, 2000. Science Publishers. Proc. 7th Working Conference on [3] R. Chiang, T. Barron, and V. Storey. Reverse Engineering, pp 70–78, 2000 Reverse engineering of relational [12] J-L Hainaut. Research in database databases: Extraction of an EER model engineering at the University of Namur. from a relational database. Data & ACM SIGMOD Record, 32(4): 124-128, Knowledge Engineering, 12:107–142, 2003. 1994. [13] Jens H. Jahnke and Jörg Wadsack, The [4] R. Chiang, T. Barron, and V. Storey. A Varlet Analyst: Employing Imperfect framework for the design and evaluation Knowledge in Database Reverse of reverse engineering methods for Engineering Tools, Proceedings 3rd relational databases. Data & Knowledge International Workshop on Intelligent Engineering, 21:57–77, 1997. Software Engineering, ICSE, 2000. [5] J.-L. Hainaut, C. Tonneau, M. Joris, and [14] Jens H. Jahnke and Jörg Wadsack, M. Chandelon. Schema transformation Research in Database Engineering at the techniques for database reverse University of Namur, ACM SIGMOD engineering. Proc. of the 12th Int. Conf. Record, Volume 32, Issue 4, (December on Entity-Relationship Approach, pp 2003,) pp. 124 – 128, 2003. 353–372, Texas, USA, Dec. 1993. [15] Norman Rajala, Djenana Campara and [6] P. Johannesson. A method for Nikolai Mansurov, Insight - REVERSE transforming relational schemas into ENGINEER CASE TOOL, Proceedings conceptual schemas. Proc. of the 10th of the 1999 International Conference on Int. Conf. on Data Engineering, pp 115– Software Engineering, pp.630 – 633. 122, 1994. [16] Kathi Hogshead Davis, August-11: A [7] V. Markowitz and J. Makowsk. Tool for Step- by-Step Identifying extended entityrelationship Reverse Engineering, Proceedings of object structures in relational schemas. 2nd Working Conference on Reverse IEEE Transactions on Software Engineering, pp. 146 – 154, Jul 1995. Engineering, 16(8), 1990. [17] John V. Harrison and Wie Ming Lim, [8] J.-M. Petit, F. Toumani, J.-F. Boulicaut, Automated Reverse Engineering of and J. Kouloumdjian. Towards the Legacy 4GL Information System reverse engineering of denormalized Applications using the ITOC Lecture relational databases. In Proc. of the 12th Notes In Computer Science; Vol. 1413, Int. Conf. on Data Engineering, New Proceedings of the 10th International Orleans, USA, Feb. 1996. IEEE Press. Conference on Advanced Information [9] W. Premerlani and M. Blaha. An Systems Engineering, pp. 41 – 57, 1998. approach for reverse engineering of [18] http://ca.com/smb/product.aspx relational databases. Communications of the ACM, 37(5), 1994. [19] www.casestudio.com/enu/default.aspx [20] www.modelcreator.com/index.php [10] L. Pedro-de Jesus and P. Sousa. Selection of reverse engineering

ISSN: 1790-5117 211 ISBN: 978-960-6766-42-8