Data Integration Solutions Buyers Guide

Data Integration Solutions Buyers Guide

2014 Data Integration Solutions Buyers Guide Includes a Category Overview; the Top 10 Questions to Ask; Plus a Capabilities Reference of the Top 24 Solution Providers in Data Integration Technology Released by Solutions Review 11/1/2014 INTRODUCTION: Big Data is the buzz - with everybody looking to jump onto the bandwagon. However, there is substance behind the buzz. When we talk about Big Data, we mean the following: with the sources and amounts of data available for analysis increasing rapidly by orders of magnitude, new analytical tools are increasingly able to gain new and important insights into the past, present and future that were not available before. Many areas of human endeavor benefit from Big Data. Science and research has certainly seen a gain, with examples including the Large Hadron Collider using hundreds of millions of data points per second to find the Higgs Boson and DNA sequencing now taking less than a week to complete. So have many private sector companies, such as Walmart, Facebook and Amazon, now better able to discover trends and exploit normally hidden shopper idiosyncrasies in order to drive revenue and profit. And of course Uncle Sam is getting in on the Big Data action, with the construction in Utah of a massively powerful data center for the NSA that will be able to handle Yottabytes of internet data. The result of all this is an industry now valued at $100 Billion, growing at 10% a year. Solutions Review is quite interested in covering this quickly evolving topic. However, we face a conundrum in trying to organize Big Data, which is, well, big. Too big for a single category, at least. So, as with any overly complex problem, the first step is to break it down into its constituent parts. Solutions Review is therefore launching our newest category, Data Integration – perhaps the most vital solution needed to take advantage of Big Data. First, a description of what we mean by Data Integration in terms of the specific tools you will need. In order to “Data Integration is take advantage of Big Data, you have to be able to have perhaps the most vital access to all of that data, no matter its physical/virtual solution needed to take location and structure. Data Federation Technology, also called Data Virtualization Technology or Data Federation advantage of Big Data.” Services, offers a way to access information about your information, called Metadata, across all parts of your organization, no matter how or where it’s stored. You can also set up a Data Federation solution to enable queries over multiple data sources, ensure data integrity, manage transactions and enable an integrated, real-time view of all data across the enterprise. This is done through mapping all data you want federated into a virtual database. Accessing data doesn’t just mean having a unified view of it all, however. For practical purposes of crunching all that data, it needs to be in one place where your analytics program can reach it. That involves “moving” data (more like copying and pasting, actually) from one place to another, usually from storage systems into a Data Warehouse capable of analyzing it. Methods for doing just that include processes called ETL (for Extract, Transform and Load) and Data Replication, the latter of which, while often used for tasks like disaster recovery and data migration, in relation to big data offers a high performance data movement tool that should be able to quickly synchronize large quantities of data. In order to conceptualize the ETL and Data Replication processes, people in the Data Integration space usually refer to where the Data is stored at the start of the process as the source or sources, whereas where you want to move/copy the data to is the target or targets. ETL tools are your basic data movement tools, which extract all the data files selected from the source, transform them into a structure readable by the target and Business Intelligence Applications on the target, and then load the transformed data into the target. ETL tools are good at moving large quantities of data all at once in what is called a batch. 2 Solutions Review | 500 West Cummings Park | Woburn, Massachusetts 01801 | USA They also do a good job when significant transformation of the data is required before loading into the target. ETL on its own can have trouble handling certain situations, however. If data in the source is changing in real-time, you may need to be able to analyze that data quickly and perhaps in real- time as well. Because ETL loads everything from the source into the target all at once during a batch transfer, the target can experience down-times for hours while the data is loaded. The more data you need to move, the more down-time. If the target isn’t supposed to be used for long periods of time, like at night, and if you don’t require immediate analysis of new/changed data, then that down-time may not be an issue. However, you could still be wasting time and money if much of the data that your ETL tool is extracting from the source is already in the target from a previous batch load. So, for those with high-performance requirements and a need to increase the efficiency of data transfer, a data replication solution will be a necessary add-on. Most data replication solutions will contain a Change-Data-Capture (CDC) module which captures changes made in source systems and then replicates that change into a target system, keeping the databases synchronized. In some cases, the CDC tool can be sold separately from the rest of the data replication package. Other parts of a data replication solution can include schema and DDL replication, an easy to manage user interface, and software and hardware architecture designed for moving large amounts of data very quickly without creating down-time for your sources or targets or interfering with the ability of your enterprise applications to keep running. This same capability can also ensure that in the event of a crash, your company has the most up to date data with which to pick up the pieces. In addition, good data replication solutions should be fully automated in order to optimize IT productivity and save costs on professional service needs. Data replication can have drawbacks, however. An enterprise data replication solution can cost many tens of thousands of dollars, placing the capability out of reach for many smaller companies. Additionally, many data replication solutions are not very good at the transformation task that’s often needed when moving data from sources to targets. The result is that you need to piggyback the replication solution on top of an ETL solution, and not all replication solutions work with all ETL solutions. In fact, it’s best to think of data replication not as a replacement for ETL, but as a complementary solution. Both processes will be needed in executing the data integration part of your Big Data strategy. To recap, we’ve covered the Federation, ETL and Replication tools for the data integration space. For the purposes of keeping the topic of data integration as narrow as possible, we at Solutions Review are going to limit the Data Integration site, solutions directory and buyers guide to just those three solutions types. This obviously ignores the physical databases and warehouses that store and process data, as well as the business intelligence platforms and applications needed to get value out of all that data, along with a whole host of other technologies that go into the big data environment. These will be topics we will revisit in other sections of our Big Data suite over the coming months. Matt Adamson Editor, Solutions Review [email protected] (339) 927-9237 3 Solutions Review | 500 West Cummings Park | Woburn, Massachusetts 01801 | USA 5 Questions You Should Ask Yourself Before Selecting a Data Integration Solution QUESTION #1 What are the business and technical needs driving my interest in data integration? In other words, why do you need to integrate your data? The nature of the application needing that data, such as a BI/Analytics program, and what you need that program to do will determine many of the technical requirements of the data integration solution you need. Will you require real-time data access and transfer? How much data will you need to move and how quickly? Can you afford some down-time on source/target systems, or do you need them running at all times? Note all these data requirements based off of your technical and business needs so that you can compare them with what prospective solutions offer. QUESTION #2 What IT resources are available to implement, run and maintain the solution? Of course, this doesn’t just refer to your IT budget. Your IT people’s time and skills will also need to align with any data integration solution, as implementation, operation and maintenance time and costs are key considerations. If the data integration solution is sucking up your IT Department’s time, problems can go unfixed and ROI on multiple IT projects could suffer. QUESTION #3 What are my data sources? And where are they located – on-premise or in the Cloud? The basic elements of data integration revolve around moving data from sources (applications) to targets (data warehouses, etc.). Much of what is powering the Big Data movement is the massive data being collected in the cloud through very large Software as a Service (SaaS) solutions like Saleforce.com. Some solutions listed in this buyers guide specialize in the integration of cloud application data with on-premise systems to ensure that your users can access complete, current, and accurate data.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    18 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us