IDACT TRANSFORMATION MANAGER by Brian Hay A

IDACT TRANSFORMATION MANAGER by Brian Hay A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science MONTANA STATE UNIVERSITY Bozeman, Montana November 2005 © COPYRIGHT by Brian Hay 2005 All Rights Reserved ii APPROVAL of a dissertation submitted by Brian Hay This dissertation has been read by each member of the dissertation committee and has been found to be satisfactory regarding content, English usage, format, citations, bibliographic style, and consistency, and is ready for submission to the College of Graduate Studies. Michael Oudshoorn Approved for the Department of Computer Science Michael Oudshoorn Approved for the College of Graduate Studies Joseph J. Fedock iii STATEMENT OF PERMISSION TO USE In presenting this dissertation in partial fulfillment of the requirements for a doctoral degree at Montana State University, I agree that the Library shall make it available to borrowers under rules of the Library. I further agree that copying of this dissertation is allowable only for scholarly purposes, consistent with “fair use” as prescribed in the U.S. Copyright Law. Requests for extensive copying or reproduction of this dissertation should be referred to ProQuest Information and Learning, 300 North Zeeb Road, Ann Arbor, Michigan 48106, to whom I have granted “the exclusive right to reproduce and distribute my dissertation in and from microform along with the non- exclusive right to reproduce and distribute my abstract in any format in whole or in part.” Brian Hay November 2005 iv TABLE OF CONTENTS 1. PROBLEM STATEMENT....................................................................................... 1 Overview............................................................................................................ 1 Detailed Problem Description............................................................................ 2 Unorganized Data.................................................................................. 3 Data Format Issues................................................................................. 4 Incomplete or Missing Metadata........................................................... 7 Problem Statement............................................................................................. 8 2. FOUNDATIONAL WORK AND CURRENT APPROACHES............................. 10 General Foundational Work............................................................................... 10 Middleware and Data Portals…............................................................. 10 Integration of Heterogeneous Schemas.................................................. 10 Transformation Tools............................................................................. 12 Transformation Components.................................................................. 12 Foundational Work............................................................................................ 12 SynCon Project Data System................................................................. 13 SIMON................................................................................................... 18 Other Systems...…............................................................................................. 20 Atmospheric Radiation Monitoring Program........................................ 20 Storage Resource Broker…................................................................... 21 3. SOLUTION SCOPE................................................................................................. 22 Intelligent Data Acquisition, Collection, and Transformation (IDACT) System.......................................................................................... 22 Overview................................................................................................ 22 Data Source Registry............................................................................. 23 Query Manager...................................................................................... 24 Transformation Manager........................................................................ 24 Schedule Manager.................................................................................. 25 Data Warehouse..................................................................................... 25 Solution Scope................................................................................................... 26 4. SOLUTION DESCRIPTION.................................................................................... 27 Overview............................................................................................................ 27 Transformation Mechanisms.............................................................................. 28 Introduction............................................................................................ 28 Identity Transformations........................................................................ 29 Predefined Transformations................................................................... 29 v TABLE OF CONTENTS (CONTINUED) Transformation Sequences..................................................................... 29 Custom Transformations........................................................................ 34 Custom Transformation Creation Process......................................................... 35 Transformation Components.................................................................. 35 Common Fields...................................................................................... 36 Simple Associations............................................................................... 37 Split Associations................................................................................... 38 Combine Associations............................................................................ 39 Association Database............................................................................. 40 Adding New Data Sources..................................................................... 41 Conflict Resolution in Building the Candidate Association List……... 44 Partial String Matching for Name-Based Field Mapping...................... 45 Context-Based Conflict Resolution....................................................... 46 Creation of New Custom Transformations............................................ 48 Conclusions........................................................................................................ 55 5. PROTOTYPE IMPLEMENTATION...................................................................... 56 Architecture........................................................................................................ 56 DSR Database.................................................................................................... 56 Transformation Manager Database.................................................................... 57 Transformation Manager Engine....................................................................... 58 Implementation Tools and Languages................................................... 60 Identity Transformations........................................................................ 61 Transformation Sequences..................................................................... 61 Creating Custom Transformations......................................................... 64 Actions................................................................................................... 75 TM User Interface.............................................................................................. 76 Command Line Interface....................................................................... 76 Graphical User Interface........................................................................ 76 Security Issues…................................................................................................ 77 Source Code……............................................................................................... 77 6. TESTING.................................................................................................................. 78 Overview............................................................................................................ 78 Test Environments............................................................................................. 78 Testing Tools.......................................................................................... 78 Transformation Sequence Testing......................................................... 79 Action Testing........................................................................................ 81 Transformation Testing.......................................................................... 84 Performance Testing.............................................................................. 86 vi TABLE OF CONTENTS (CONTINUED) Custom Transformation Generation Testing.......................................... 89 Evaluation Methods........................................................................................... 89 7. SUMMARY............................................................................................................. 92 Conclusions........................................................................................................ 92 Future Work......................................................................................................

IDACT TRANSFORMATION MANAGER by Brian Hay A

Informal Data Transformation Considered Harmful

Automating the Capture of Data Transformations from Statistical Scripts in Data Documentation Jie Song George Alter H

Improving E-Learning Environments for Pen and Multi-Touch Based Interaction a Study Case on Blog Tools and Mobile Devices

What Is a Data Warehouse?

POLITECNICO DI TORINO Repository ISTITUZIONALE

Data Warehousing on AWS

Clarifying the Legacy Data Conversion Plan & Introducing The

Design and Implementation of ADPCM Based Audio Compression Using Verilog

The Development of Algorithms for On-Demand Map Editing for Internet and Mobile Users with Gml and Svg

Master Data Simplified

Mobile Pen-And-Paper Interaction

Merchandising Data Conversion Implementation Guide