Etl – State of the Art
Total Page:16
File Type:pdf, Size:1020Kb
Department of Computer Science ETL – STATE OF THE ART by RICARDO FORTUNA RAMINHOS Monte de Caparica, April 2007 Acronyms Table 1: List of Acronyms Acronym Description 3GL Third Generation Language 3M Multi Mission Module 3NF Third Normal Form API Application Programmers’ Interface ARS Access Rule Service AS Audit Service BTO Built-To-Order CA Computer Associates CASE Computer-Aided Software Engineering CDC Change Data Capture CDI Customer Data Integration CLF Common Log Format CLIPS C Language Integrated Production System CORBA Common Object Request Broker Architecture COM Component Object Model COTS Commercial Off-The-Shelf CPU Central Processing Unit CRC Cyclic Redundancy Checksum CRM Customer Relationship Management CSV Comma Separated Values CWM Common Warehouse Model DAS Data Access Service DBMS Database Management System DDL Data Definition Language DM Data Mart DIM Data Integration Module DLL Dynamic Link Library DODS Data Object Design Studio DOM Document Object Model DOML DODS-XML DPM Data Processing Module DSL Data System Libraries DTD Document Type Definition DTS Data Transformation Services DW Data Warehouse EAI Enterprise Application Integration ECM Enterprise Content Management EDR Enterprise Data Replication EII Enterprise Information Integration EIM Enterprise Information Management EJBs Enterprise Java Beans ELT Extract, Load and Transform ER Entity Relationship ERP Enterprise Resource Planning ETI Evolutionary Technologies International ETLT Extract, Transform, Load and Transform ETTL Extract, Transform, Transport and Load Envisat Environmental Satellite - I - ESA European Space Agency ETL Extract, Transform and Load FCT (1) “Faculdade de Ciências e Tecnologia” FCT (2) Flight Control Team FFD File Format Definition FTP File Transfer Protocol FQS Federated Query Service FSS Federated Schema Service FTP File Transfer Protocol GUI Graphical User Interface HMM Hidden Markov Modelling HTML Hyper Text Mark-up Language HTTP Hyper Text Transfer Protocol HTTPS Hyper Text Transfer Protocol Secure IBHIS Integration Broker for Heterogeneous Information Sources IBIS Internet-Based Information System IDE Integrated Development Environment IDMS Integrated Database Management System IKM Integration Knowledge Module IIS Internet Information System IMAP Internet Message Access Protocol IMS Information Management System INTEGRAL International Gamma-Ray Astrophysics Laboratory (satellite) IP Internet Protocol IT Information Technology J2EE Java 2 Enterprise Edition JCL Job Control Language JDBC Java Data Base Connection JESS Java Expert System Shell JMS Java Message Service JNDI Java Naming and Directory Interface JVM Java Virtual Machine KETTLE Kettle ETTL Environment LDAP Lightweight Directory Access Protocol LKM Load Knowledge Module MAPI Messaging Application Programming Interface MEO Medium Earth orbits MOM Message-Oriented Middleware MR Metadata Repository MS Microsoft MT Monitoring Tool NoDoSE Northwestern Document Structure Extractor ODBC Open Data Base Connectivity ODS Operational Data Storage OLAP On-Line Analytical Processing OLE DB Object Linking and Embedding Database OS Operating System PDF Portable Document Format PL/SQL Procedural Language / Structured Query Language POP Post Office Protocol RAM Random Access Memory RAT Reporting and Analysis Tool RDBMS Relational Database Management Systems Regex Regular Expression RIFL Rapid Integration Flow Language - II - RMI Remote Method Invocation RS Registry Service RSH Remote Shell S/C Spacecraft S/W Space Weather SADL Simple Activity Definition Language SAX Simple API for XML SDK Java Software Development Kit SEIS Space Environment Information System for Mission Control Purposes SESS Space Environment Support System for Telecom and Navigation Systems SFTP Secure File Transfer Protocol SML Simple Mapping Language SOA Services Oriented Architecture SNMP Simple Network Management Protocol SOAP Simple Object Access Protocol SQL Structured Query Language SSIS SQL Server Integration Services TM Transformation Manager UDAP Uniform Data Access Proxy UDOB Uniform Data Output Buffer UDET Uniform Data Extraction and Transformer UNL “Universidade Nova de Lisboa” URL Uniform Resource Locator URS User Registration Service UTL Universal Transformation Language VSAM Virtual Storage Access Method XADL XML-based Activity Definition Language XMI XML Metadata Interchange XML eXtended Markup Language XMM X-Ray Multi-Mission (satellite) XPath XML Path Language XPDL XML Process Definition Language XQuery XML Query Language XSD XML Schema Definition XSL Extensible Stylesheet Language XSL FO XSL Formatting Objects XSLT XSL Transformations WSDL Web Service Definition Language WWW World Wide Web - III - Index 1.1 MOTIVATION – THE CORRECT ETL TOOL .............................................13 1.2 ETL CONCEPTUAL REPRESENTATION AND FRAMEWORK...........................15 1.3 CLASSICAL DATA INTEGRATION ARCHITECTURES ..................................18 1.3.1 Hand Coding ........................................................................ 18 1.3.2 Code Generators................................................................... 19 1.3.3 Database Embedded ETL........................................................ 20 1.3.4 Metadata Driven ETL Engines.................................................. 20 1.4 APPROACHES TO DATA PROCESSING...................................................21 1.4.1 Data Consolidation ................................................................ 22 1.4.2 Data Federation.................................................................... 24 1.4.3 Data Propagation .................................................................. 25 1.4.4 Hybrid Approach ................................................................... 25 1.4.5 Change Data Capture ............................................................ 26 1.4.6 Data Integration Technologies ................................................ 27 1.5 METADATA FOR DESCRIBING ETL STATEMENTS ....................................33 1.6 RESEARCH ETL TOOLS ....................................................................36 1.6.1 AJAX ................................................................................... 36 1.6.2 ARKTOS .............................................................................. 39 1.6.3 Clio..................................................................................... 42 1.6.4 DATAMOLD .......................................................................... 45 1.6.5 IBHIS.................................................................................. 47 1.6.6 IBIS.................................................................................... 51 1.6.7 InFuse................................................................................. 54 1.6.8 INTELLICLEAN ...................................................................... 57 1.6.9 NoDoSe............................................................................... 59 1.6.10 Potter’s Wheel ...................................................................... 62 1.7 FREEWARE / OPEN SOURCE AND SHAREWARE ETL TOOLS ......................66 1.7.1 Enhydra Octopus .................................................................. 66 - IV - 1.7.2 Jitterbit ............................................................................... 71 1.7.3 KETL ................................................................................... 76 1.7.4 Pentaho Data Integration: Kettle Project .................................. 78 1.7.5 Pequel ETL Engine................................................................. 81 1.7.6 Talend Open Studio............................................................... 84 1.8 COMMERCIAL ETL TOOLS.................................................................90 1.8.1 ETL Market Analysis .............................................................. 90 1.8.2 Business Objects Data Integrator ............................................ 97 1.8.3 Cognos DecisionStream ....................................................... 103 1.8.4 DataMirror Transformation Server ......................................... 107 1.8.5 DB Software Laboratory’s Visual Importer Pro ......................... 109 1.8.6 DENODO............................................................................ 114 1.8.7 Embarcadero Technologies DT/Studio .................................... 117 1.8.8 ETI Solution v5................................................................... 120 1.8.9 ETL Solutions Transformation Manager................................... 121 1.8.10 Group1 Data Flow ............................................................... 125 1.8.11 Hummingbird Genio ............................................................ 129 1.8.12 IBM Websphere Datastage ................................................... 134 1.8.13 Informatica PowerCenter...................................................... 137 1.8.14 IWay Data Migrator............................................................. 143 1.8.15 Microsoft SQL Server 2005 ................................................... 148 1.8.16 Oracle Warehouse Builder .................................................... 154 1.8.17 Pervasive Data Integrator .................................................... 160 1.8.18 SAS ETL ............................................................................ 165 1.8.19 Stylus Studio...................................................................... 170 1.8.20 Sunopsis Data Conductor ....................................................