2.3 Reverse Engineering 22 2.4 Summary 25

Durham E-Theses Acquiring data designs from existing data-intensive programs Yang, Hongji How to cite: Yang, Hongji (1994) Acquiring data designs from existing data-intensive programs, Durham theses, Durham University. Available at Durham E-Theses Online: http://etheses.dur.ac.uk/5161/ Use policy The full-text may be used and/or reproduced, and given to third parties in any format or medium, without prior permission or charge, for personal research or study, educational, or not-for-prot purposes provided that: • a full bibliographic reference is made to the original source • a link is made to the metadata record in Durham E-Theses • the full-text is not changed in any way The full-text must not be sold in any format or medium without the formal permission of the copyright holders. Please consult the full Durham E-Theses policy for further details. Academic Support Oce, Durham University, University Oce, Old Elvet, Durham DH1 3HP e-mail: [email protected] Tel: +44 0191 334 6107 http://etheses.dur.ac.uk The copyright of this thesis rests with the author. No quotation from it should be published without his prior written consent and information derived from it should be acknowledged. ACQUIRING DATA DESIGNS FROM EXISTING DATA-INTENSIVE PROGRAMS Ph.D Thesis University of Durham Department of Computer Science Hongji Yang 1994 HAY mi Abstract The problem area addressed in this thesis is extraction of a data design from existing data intensive program code. The purpose of this is to help a software maintainer to understand a software system more easily because a view of a software system at a high abstraction level can be obtained. Acquiring a data design from existing data intensive program code is an important part of reverse engineering in software maintenance. A large proportion of software systems currently needing maintenance is data intensive. The research results in this thesis can be directly used in a reverse engineering tool. A method has been developed for acquiring data designs from existing data intensive programs, COBOL programs in particular. Program transformation is used as the main tool. Abstraction techniques and the method of crossing levels of abstraction are also studied for acquiring data designs. A prototype system has been implemented based on the method developed. This involved implementing a number of program transformations for data ab• straction, and thus contributing to the production of a tool. Several case studies, including one case study using a real program with 7000 lines of source code, are presented. The experiment results show that the Entity-Relationship Attribute Diagrams derived from the prototype can represent the data designs of the original data intensive programs. The original contribution of the thesis is that the approach presented in this thesis can identify and extract data relationships from the existing code by combining analysis of data with analysis of code. The approach is believed to be able to provide better capabilities than other work in the field. The method has indicated that acquiring a data design from existing data intensive program code by program transformation with human assistance is an effective method in software maintenance. Future work is suggested at the end of the thesis including extending the method to build an industrial strength tool. Acknowledgements I wish to acknowledge the Department of Trade and Industry (of the United King• dom), the Science and Engineering Research Council (of the United Kingdom) and IBM (UK) for the financial support of the research. I wish to thank Prof. K. H. Bennett sincerely for agreeing to be my super• visor, for his invaluable advice and encouragement and for his skillful supervision and administration throughout this research project. I would Hke to thank Dr. Martin Ward, Dr. Tim Bull, Mr. Nigel Scriven, Mr. Brendan Hodgson, Dr. Roger Hutty, Dr. Brian Bramer and Mr. Tom Ronton for their useful advice in research techniques, support in working environ• ment, encouragement in completing the thesis and help in improving my English language. I would also like to thank my wife Xiaodong Zhang for her unselfish support and to admit that I am in debt to my son Tianxiu Yang for being unable to spend enough time with him which I should whilst undertaking this study. The thesis is prepared with I^T^X [97,99]. Copyright The copyright of this thesis rests with the author. No quotation from it should be published without his prior written consent and information derived from it should be acknowledged. n Statement The material offered in this thesis has not been submitted wholly or in part for any academic award or qualification other than that for which it is now submitted. During the period of registered study in which this thesis was prepared, the author has not been registered for any other academic award or qualification. ni Contents 1 Introduction 1 1.1 Purpose of the Research and Overview of Problem 1 1.2 Scope of the Thesis and Original Contribution 3 1.3 Criteria for Success 5 1.4 Thesis Structure 6 2 Software Engineering 7 2.1 Software and Software Engineering 7 2.1.1 Computer System Evolution 7 2.1.2 Software Engineering and its Paradigms 9 2.1.3 Advantages and Disadvantages of Three Software Engineer• ing Paradigms 11 2.1.4 A Generic View of Software Engineering 12 2.1.5 Software Quality and Software QuaHty Assurance 14 2.1.6 Current State of Software Engineering 15 2.2 Software Maintenance 17 2.3 Reverse Engineering 22 2.4 Summary 25 3 Work Related to Reverse Engineering 27 3.1 Introduction 27 3.2 Formal Specification 29 3.2.1 Specifications 29 3.2.2 Algebraic Specification Languages 32 IV Contents v 3.2.3 State-Machine Specification Languages 36 3.2.4 Abstract Model Specification Languages 37 3.2.5 A Comparison of the Approaches 41 3.3 Program Transformation Systems 42 3.3.1 Refinement and Transformational Programming 42 3.3.2 Features of Transformation Systems 44 3.3.3 Program Transformation Systems 46 3.3.4 Summary 49 3.4 Program Verification 49 3.4.1 Concept of Proof (Program Proving) 49 3.4.2 Examples of Existing Program Verification Tools 53 3.4.3 Summary 56 3.5 An Overview of the Main Existing Reverse Engineering Ap• proaches Used in Software Maintenance Projects 57 3.5.1 MACS 58 3.5.2 Reverse Engineering in REDO 59 3.5.a Sneed'sWork 60 3.5.4 A CASE Tool for Reverse Engineering 62 3.5.5 TMM 63 3.5.6 A Concept Recognition-Based Program Transforma• tion System 64 3.5.7 REFORM 65 3.6 Summary 65 4 Proposed Research Problem 68 4.1 Features of Data-Intensive Programs 69 4.1.1 Software Design Process 69 4.1.2 Structured Systems Analysis and Design Method 71 4.1.3 Features of Data-Intensive Programs 76 4.2 Representing Data Designs Using Entity-Relationship Attribute Diagrams 78 4.2.1 Entity Models 78 Contents vi 4.2.2 Entity-Relationship Attribute Diagrams in SSADM .... 79 4.3 Reverse Engineering Through Data Abstraction 81 4.3.1 Abstraction Techniques in Programming 81 4.3.2 Data Abstraction 83 4.3.3 Data Type 84 4.3.4 Abstract Data Types 88 4.3.5 Abstraction Levels of Data and Software 88 4.4 Definition of Proposed Research Problem 90 5 Working Environment and Design Recovery Method 92 5.1 Working Environment 92 5.1.1 Ward's Work and its Application in the REFORM Project 93 5.1.2 The Wide Spectrum Language 94 5.1.3 Advantages of Using Program Transformation and WSL . 102 5.1.4 The Original Design of the Maintainer's Assistant 103 5.1.5 The State of the Maintainer's Assistant by 1991 104 5.2 Review of the Maintainer's Assistant 109 5.3 Recovering Data Designs Ill 5.3.1 Combining Code Analysis with Data Abstraction Ill 5.3.2 Using Program Transformations and WSL Ill 5.3.3 Analysis of the Problems with Data-Intensive Programs 112 5.3.4 A Design Recovery Method 113 5.3.5 Enhancement Design of the Maintainer's Assistant for Ac• quiring Data Designs from Data-intensive Programs ... 114 6 Extending and Using WSL 117 6.1 Introduction to the WSL Extension 117 6.2 Extension of WSL 119 6.2.1 Representing Records and Files 120 6.2.2 Representing Basic Data Types and User-Defined Abstract Data Types 130 6.2.3 Representing Entity-Relationship Attribute Diagrams 133 Contents vii 6.3 Extension of Meta-WSL 136 6.4 Embedding WSL in COMMON LISP 138 6.5 Translating Data Intensive Programs to WSL 140 6.5.1 Consideration and Decision 140 6.5.2 An Example of Translating A COBOL Program into WSL 141 6.6 Program Transformation Writing 146 7 Program Transformations and Data Abstraction 149 7.1 Introduction 149 7.2 Influence of Forward Engineering on Reverse Engineering . 150 7.2.1 Crossing Levels of Data Abstraction 150 7.2.2 Role of Human Knowledge 153 7.3 Acquiring Data Designs from Program Code Using Program Transformation 156 7.3.1 Different Types of Transformation and Abstraction Levels 156 7.3.2 "Back Tracking" 158 7.3.3 Formal Definition of Three Types of Transformation 159 7.4 Issues on Inventing and Proving Program Transformations .... 162 7.5 Program Transformations of Data Abstraction 164 7.5.1 Transformations for Deriving Records 164 7.5.2 From Records to Data at the Conceptual Level 166 7.5.3 From Code Level Data Operations to Data Relations .

2.3 Reverse Engineering 22 2.4 Summary 25

Department of Computer Science & Engineering

Formal Specification— Z Notation— Syntax, Type and Semantics

Research on Consistency Checking of Different Aspects Models of the Information System

A Formal Methodology for Deriving Purely Functional Programs from Z Specifications Via the Intermediate Specification Language Funz

Improvement in the Requirement Specification Using Z Language And

Z: GRAMMAR and CONCRETE and ABSTRACT SYNTAXES (Version 2.0)

Support for Model Checking Z Specifications Maria Ulfah Siregar

Annotated Z Bibliography

Process and Progress of Requirement Formalization in Software Engineering

Comparative Analysis of Formal Specification Languages Z, VDM and B

A Case Study of a Formalized Security Architecture

A Descriptive Formalisation of Student Management System (SMS)