TDWI rese a rch

TDWI best practices Report Second Quarter 2011

Next Generation

By Philip Russom

tdwi.org Research Sponsors

DataFlux

IBM

Informatica

SAP

Syncsort

Talend second QUARTER 2011 TDWI best practices Report

Next Generation Data Integration

By Philip Russom

Table of Contents Research Methodology and Demographics 3

Introduction to Next Generation Data Integration 4 Ten Rules for Next Generation Data Integration 4 Why Care About NGDI Now? ...... 6 Leading Generational Changes for Data Integration 7 Expanding Into More DI Techniques 7 Users’ Data Integration Tool Portfolios ...... 9 DI Tool and Platform Replacements ...... 10 Data Types Being Integrated ...... 13 Data Integration Architecture 14 Organizational Issues for NGDI 17 Organizational Structures for DI Teams 17 Unified Data Management ...... 20 Collaborative Data Integration ...... 22 Catalog of NGDI Practices, Tools, and Platforms 23 Potential Growth versus Commitment for DI Options . . . . . 24 Trends for Next Generation Data Integration Options 26 Vendor Products and Platforms for NGDI 28 Recommendations 30

© 2011 by TDWI (The Data Warehousing InstituteTM), a division of 1105 Media, Inc. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. E-mail requests or feedback to [email protected]. Product and company names mentioned herein may be trademarks and/or registered trademarks of their respective companies.

tdwi.org 1 Ne x t Generation Data Integration

About the Author PHILIP RUSSOM is a well-known figure in data warehousing and business intelligence, having published more than 500 research reports, magazine articles, opinion columns, speeches, Webinars, and more. Today, he’s TDWI Research Director for Data Management at The Data Warehousing Institute (TDWI), where he oversees many of TDWI’s research-oriented publications, services, and events. Before joining TDWI in 2005, Russom was an industry analyst covering BI at Forrester Research, Giga Information Group, and Hurwitz Group. He also ran his own business as an independent industry analyst and BI consultant and was a contributing editor with leading IT magazines. Before that, Russom worked in technical and marketing positions for various database vendors. You can reach him at [email protected].

About TDWI TDWI, a division of 1105 Media, Inc., is the premier provider of in-depth, high-quality education and research in the business intelligence and data warehousing industry. TDWI is dedicated to educating business and information technology professionals about the best practices, strategies, techniques, and tools required to successfully design, build, maintain, and enhance business intelligence and data warehousing solutions. TDWI also fosters the advancement of business intelligence and data warehousing research and contributes to knowledge transfer and the professional development of its Members. TDWI offers a worldwide Membership program, five major educational conferences, topical educational seminars, role-based training, onsite courses, certification, solution provider partnerships, an awards program for best practices, live Webinars, resourceful publications, an in-depth research program, and a comprehensive Web site: tdwi.org.

About the TDWI Best Practices Reports Series This series is designed to educate technical and business professionals about new business intelligence technologies, concepts, or approaches that address a significant problem or issue. Research for the reports is conducted via interviews with industry experts and leading-edge user companies and is supplemented by surveys of business intelligence professionals. To support the program, TDWI seeks vendors that collectively wish to evangelize a new approach to solving business intelligence problems or an emerging technology discipline. By banding together, sponsors can validate a new market niche and educate organizations about alternative solutions to critical business intelligence issues. Please contact TDWI Research Director Philip Russom ([email protected]) to suggest a topic that meets these requirements.

Acknowledgments TDWI would like to thank many people who contributed to this report. First, we appreciate the many users who responded to our survey, especially those who responded to our requests for phone interviews. Second, our report sponsors, who diligently reviewed outlines, survey questions, and report drafts. Finally, we would like to recognize TDWI’s production team: Jennifer Agee, Rod Gosser, and Denelle Hanlon.

Sponsors DataFlux, IBM, Informatica, SAP, Syncsort, and Talend sponsored the research for this report.

2 TDWI rese arch Research Methodology and Demographics

Research Methodology and Position Demographics Corporate IT professional 67% Consultants 26% Report Scope. Data integration (DI) has changed so quickly Business sponsors/users 7% and completely in recent years that it scarcely resembles older definitions. For example, some people still think of DI as merely ETL for data warehousing or data movement utilities Industry for database administration. Those basic tasks and use cases Financial services 17% are still prominent in DI practice. Yet, DI practices and tools Consulting/professional services 16% have broadened into many more techniques and use cases. Insurance 9% While it’s good to have options, it’s hard to track them and Software/Internet 8% determine in which situations they are ready for use. The Telecommunications 6% purpose of this report is to accelerate users’ understanding Healthcare 5% of the many new products and options that have entered DI Manufacturing (non-computers) 5% practices in recent years. It will also help readers map newly Retail/wholesale/distribution 4% available technologies, products, and practices to real-world Government: federal 4% use cases. Education 3% Survey Methodology. In November 2010, TDWI sent an Pharmaceuticals 3% invitation via e-mail to the data management professionals Media/entertainment/publishing 3% in its database, asking them to complete an Internet- Utilities 3% based survey. The invitation was also distributed via Web Other 14% sites, newsletters, and publications from TDWI and other firms. The survey drew responses from almost 350 survey (“Other” consists of multiple industries, each represented by 2% or less of respondents.) respondents. From these, we excluded incomplete responses and respondents who identified themselves as academics or vendor employees. The resulting completed responses of 323 Geography respondents form the core data sample for this report. United States 51% Survey Demographics. The wide majority of survey Europe 25% respondents are corporate IT professionals (67%), whereas the Asia 8% remainder consists of consultants (26%) or business sponsors/ Australia 4% users (7%). We asked consultants to fill out the survey with a Canada 4% recent client in mind. Africa 2% Central or South America 2% The financial services (17%) and consulting (16%) industries Middle East 1% dominate the respondent population, followed by insurance 3% (9%), software (8%), telecommunications (6%), and other Other industries. Most survey respondents reside in the U.S. (51%) or Europe (25%). Respondents are fairly evenly distributed Company Size by Revenue across all sizes of companies and other organizations. Less than $100 million 22% Other Research Methods. In addition to the survey, TDWI $100–500 million 14% Research conducted many telephone interviews with $500 million–$1 billion 11% technical users, business sponsors, and recognized data $1–5 billion 16% management experts. TDWI also received product briefings $5–10 billion 9% from vendors that offer products and services related to the More than $10 billion 18% best practices under discussion. Don’t know 10%

Based on 323 survey respondents.

tdwi.org 3 Ne x t Generation Data Integration

Introduction to Next Generation Data Integration All aspects of DI have Data integration (DI) has undergone an impressive evolution in recent years. Today, DI is a rich set improved significantly of powerful techniques, including ETL (extract, transform, and load), data federation, replication, of late. synchronization, changed data capture, , , natural language processing, business-to-business data exchange, and more. Furthermore, vendor products for DI have achieved maturity, users have grown their DI teams to epic proportions, competency centers regularly staff DI work, new best practices continue to arise (such as collaborative DI and agile DI), and DI as a discipline has earned its autonomy from related practices such as data warehousing and database administration.

This report brings the To help user organizations understand and embrace all that next generation data integration reader up to date on (NGDI) now offers, this report catalogs and prioritizes the many new options for DI. This report DI’s many changes. literally redefines data integration, showing that its newest generation is an amalgam of old and new techniques, best practices, organizational approaches, and home-grown or vendor-built functionality. The report brings readers up to date by discussing relatively recent (and ongoing) evolutions of DI that make it more agile, architected, collaborative, operational, real-time, and scalable. It points to new platforms for DI tools (open source, cloud, SaaS, and unified data management) and DI’s growing coordination with related best practices in data management (especially data quality, metadata and master data management, data integration acceleration, data governance, and stewardship). The report also quantifies trends among DI users who are moving into a new generation, and it provides an overview of representative vendors’ DI tools. The goal is to help users make informed decisions about which combinations of DI options match their business and technology requirements for the next generation. But the report also raises the bar on DI, under the assumption that a truly sophisticated and powerful DI solution will leverage DI’s modern best practices using up-to-date tools.

Ten Rules for Next Generation Data Integration DI’s 10 rules define Data integration has evolved and grown so fast and furiously in the last 10 years that it has desirable traits of its transcended ancient definitions. Getting a grip on a modern definition of DI is difficult, because next generation. “data integration” has become an umbrella term and a broad concept that encompasses many things. To help you get that grip, the 10 rules for next generation data integration listed on the next page provide an inventory of techniques, team structures, tool types, methods, mindsets, and other DI solution characteristics that are desirable for a fully modern next generation DI solution. Note that the list is a summary that helps you see the new-found immensity of DI; the rest of the report will drill into the details of these rules. Admittedly, the list of 10 rules is daunting because it’s thorough. Few organizations will need or want to embrace all of them; you should pick and choose according to your organization’s requirements and goals. Even so, the list both defines the new generation of data integration and sets the bar high for those pursuing it.1

4 TDWI rese arch 1 For a similar list with more details, see the TDWI Checklist Report Top Ten Best Practices for Data Integration, available on tdwi.org. Introduction

1. DI is a family of techniques. Some data management professionals still think of DI as merely DI encompasses many ETL tools for data warehousing or data replication utilities for database administration. techniques that may Those use cases are still prominent, as we’ll see when we discuss TDWI survey data. Yet, DI be hand coded or tool practices and tools have broadened into a dozen or more techniques and use cases. based, either analytic or 2. DI techniques may be hand coded, based on a vendor’s tool, or both. TDWI survey data shows operational. that migrating from hand coding to using a vendor DI tool is one of the strongest trends as organizations move into the next generation. A common best practice is to use a DI tool for most solutions, but augment it with hand coding for functions missing from the tool. 3. DI practices reach across both analytics and operations. DI is not just for data warehousing (DW). Nor is it just for operational database administration (DBA). It now has many use cases spanning across many analytic and operational contexts, and expanding beyond DW and DBA work is one of the most prominent generational changes for DI. 4. DI is an autonomous discipline. Nowadays, there’s so much DI work to be done that DI teams with 13 or more specialists are the norm; some teams have more than 100! The diversity of DI work has broadened, too. Due to this growth, a prominent generational decision is whether to staff and fund DI as is, or to set up an independent team or competency center for DI. 5. DI is absorbing other data management disciplines. The obvious example is DI and data Don’t do DI in a vacuum. quality (DQ), which many users staff with one team and implement on one unified vendor It needs coordination platform. A generational decision is whether the same team and platform should also support with many technical and master data management, replication, data sync, event processing, and data federation. business teams. 6. DI has become broadly collaborative. The larger number of DI specialists requires local collaboration among DI team members, as well as global collaboration with other data management disciplines, including those mentioned in the previous rule, plus teams for message/service buses, database administration, and operational applications. 7. DI needs diverse development methodologies. A number of pressures are driving generational changes in DI development strategies, including increased team size, operational versus analytic DI projects, greater interoperability with other data management technologies, and the need to produce solutions in a more lean and agile manner. 8. DI requires a wide range of interfaces. That’s because DI can access a wide range of source and target IT systems in a variety of information delivery speeds and frequencies. This includes traditional interfaces (native database connectors, ODBC, JDBC, FTP, APIs, bulk loaders) and newer ones (Web services, SOA, and data services). The new ones are critical to next generation requirements for real time and services. Furthermore, as many organizations extend their DI infrastructure, DI interfaces need to access data on-premises, in public and private clouds, and at partner and customer sites. 9. DI must scale. Architectures designed by users and servers built by vendors need to scale up Like any enterprise and scale out to both burgeoning data volumes and increasingly complex processing, while application, DI deserves still providing high performance at scale. With volume and complexity exploding, scalability architecture, which is a critical success factor for future generations. Make it a top priority in your plans. affects whether it can support next generation 10. DI requires architecture. It’s true that some DI tools impose an architecture (usually hub and requirements. spoke), but DI developers still need to take control and design the details. DI architecture is important because it strongly enables or inhibits other next generation requirements for scalability, real time, high availability, server interoperability, and data services.

tdwi.org 5 Ne x t Generation Data Integration

Why Care About NGDI Now? The recession has Businesses face change more often than ever before. Recent history has seen businesses repeatedly changed business, so DI adjusting to boom-and-bust economies, a recession, financial crises, shifts in global dynamics needs to realign with new or competitive pressures, and a slow economic recovery. DI supports real-world applications and business goals for data. business goals, which are affected by economic issues. Periodically, you need to adjust DI solutions to align with technical and business goals for data. The next generation is an opportunity to fix the failings of prior generations. For example, most lack a recognizable architecture, whereas achieving next generation requirements—especially real time, data services, and high availability—requires a modern architecture. Older ETL solutions, in particular, are designed for serial processing, whereas they need to be redesigned for parallel processing to meet next generation performance requirements for massive data volumes. Most DI solutions are Some DI solutions are in serious need of improvement or replacement. For example, most DI solutions out-of-date or feature– for business-to-business (B2B) data exchange are legacies, based on low-end techniques such as hand poor, in some respect. coding, flat files, and file transfer protocol (FTP). These demand a serious makeover—or rip and replace—if they’re to bring modern DI techniques into B2B data exchange. Similar makeovers are needed with older data warehouses, customer data hubs, and data sync solutions. Even mature DI solutions have room to grow. Successful DI solutions mature through multiple lifecycle stages. In many cases, NGDI focuses on the next phase of a carefully planned evolution. For many, the next generation is about tapping more functions of DI tools they already have. For example, most DI platforms have supported data federation for a few years now, yet only 30% of users have tapped this capability. Also to be tapped are newer capabilities for real time, micro-batch processing, changed data capture (CDC), messaging, and complex event processing (CEP). Unstructured data is still an unexplored frontier for most DI solutions. Many vendor DI platforms now support text analytics, text mining, and other forms of natural language processing. Handling non- structured and complex data types is a desirable generational milestone in text-laden industries such as insurance, healthcare, and federal government. Plan to evolve DI into DI is on its way to becoming IT infrastructure. For most organizations, this is a few generations away. shared enterprise But you need to think ahead to the day when data integration infrastructure is open and accessible infrastructure. to most of the enterprise the way that local area networks are today. Evolving DI into a shared infrastructure fosters business integration via shared data. Many DI teams need DI is a growing and evolving practice. More organizations are doing more DI, yet staffing hasn’t kept a next generation pace with the growth. And DI is becoming more autonomous every day. You may need to rethink reorganization. the headcount, skill sets, funding, management, ownership, and structure of DI teams.

6 TDWI rese arch Leading Generational Changes

Leading Generational Changes for Data Integration

Expanding Into More DI Techniques As pointed out earlier, DI consists of multiple, related data management techniques. The number of techniques applied in a DI solution or used by a DI team can be an indicator of DI maturity. For example, many DI solutions begin with a focus on one technique, then add others as the solution moves through project phases or generations. Increasing the number of techniques is often paralleled by increases in team head count and DI tools in the software portfolio. Many teams are driven to adopt more techniques because they begin supporting a new user constituency, which demands new approaches to data integration (as when new performance management requirements demand data federation). Hence, the number of DI techniques and the priority each receives are milestones on the road to the next generation of a DI solution. To quantify this situation, a survey question asked respondents which DI techniques they’re using today, and in what priority order (see Figure 1). Respondents selected techniques from a short list of the most common ones. (Later, we’ll see responses from a much longer list.) ETL is the most common first priority. Extract, transform, and load (ETL) is without doubt the ETL is by far the top DI preferred DI technique for business intelligence (BI) and data warehousing (DW). Given that a priority, seconded by its large portion of respondents are BI/DW professionals, it’s no surprise that ETL is in use at 95% of variant, ELT. surveyed organizations. In fact, 75% identified it as their first priority among DI techniques. ELT is the leading secondary priority. As a variant of ETL, ELT also scored well with the survey audience. TDWI sees ELT as a gainer, its use being driven up by the increased processing power of recent DBMS releases, the arrival of new analytic DBMSs, increased use of in-database processing, lingering hand-coded traditions for SQL, and increased use of secondary ETL tools (especially open source tools, which support in-database transforms). Replication and data synchronization are a significant, though tertiary, priority. At 45% total, these Although not popular in fared well in the survey. For moving data with little or no transformation (for which ETL may DW, replication and sync be preferred), these kinds of tools are a good choice because of their low cost (relative to ETL), are big elsewhere. simplicity, changed data capture functions, minimal invasiveness, and their ability to run in real time or be event driven. Given their strengths, it seems odd that replication and synchronization aren’t used more in BI and DW contexts. Application integration technologies often transport data for integration. Judging by Figure 1, almost 40% of organizations surveyed are doing this today. This form of technology uses some type of bus to support messages, events, and services. Although not designed for DI, a bus can carry data in its messages and processing instructions via services. For organizations with a hefty bus implementation in place, this infrastructure is often open to and effective for some DI functions, especially those that must reach operational applications or operate in real time. Data federation is finally ensconced as a DI technique. Federation has been around for years in Federation has a new low-end forms such as distributed queries and materialized views. Modern tools, however, provide presence, and event superior design and maintenance functions for federation (plus higher performance) that make it processing has just far more compelling as a feature you’d depend on. Federation is also more compelling as it becomes arrived. ever more virtual. These advances help explain why federation has recently become an ensconced DI technique (30% in Figure 1).

tdwi.org 7 Ne x t Generation Data Integration

Event processing is a recent addition to the DI arsenal. More than 20% of survey respondents have incorporated some form of event processing into their DI solutions, which is significant given the newness of this practice.

Which of the following DI techniques are you using in your DI solutions today? Click one or more of the following answers, in priority order from most used to least used.

Priority Order

First Second Third Fourth Fifth Sixth

Extract, transform, and load (ETL) Data federation or virtualization Extract, load, and transform (ELT) Messaging or application integration Replication or data synchronization Event processing

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Percentage of Respondents

Figure 1. Based on 323 responses. Sorted by first priority.

USER STORY Generational change can entail a deeper dive into a vendor’s tool. “To enhance our ability to track data lineage, standardize load scripts, validate domains, and cleanse our customer data, we purchased a vendor’s data integration platform. We have now replaced our old hand-coded scripts with this platform,” said Rick Ellis, the enterprise data architect at Frost Bank. “Today, the platform is up and running. We now need to enhance our knowledge of the integration platform’s functionality to perform data analysis and integrate new data stores, as well as address the business’ next generation of requirements.

“For example, we’ve made our first pass with a data quality solution, and this will continue to be a high priority. Our grass-roots data stewardship program made a meaningful contribution to the quality solution, and we have morphed stewardship into a broader data governance board to assist with other data management disciplines. Upcoming priorities are to get beyond matching, de-duping, and name- and-address cleansing and go into other quality functions. Before any changes are made, impact analysis is essential.

“In the longer term, our ETL team will assist with database migrations, consolidations, and upgrades to help to keep the data clean. Plus, they will probably inherit business-to-business data exchange with partnering financial services companies. The vendor platform we acquired has functions for these, which should help as we grow beyond data warehousing into operational data integration.”

8 TDWI rese arch Leading Generational Changes

Users’ Data Integration Tool Portfolios There are different ways to characterize a user’s software portfolio. For DI tools, it’s interesting to assess portfolios by the number of tools and the number of tool providers. This is what the survey question in Figure 2 quantifies. A few generational trends are suggested by comparing results for “today” and “would prefer”:

Users would prefer to simplify their portfolios. If user preferences pan out, fewer will acquire DI Many users desire tools from multiple vendors. According to the survey data, the number of user organizations using more tools, but from multiple DI tools from multiple vendors will drop from 44% to 25%. Part of this is the “one throat fewer vendors. to choke” issue concerning support and maintenance. Related reasons may include the ongoing trends toward tool standardization and focusing on preferred suppliers for the sake of bulk discounts and other preferential treatment. Users want to reduce the amount of hand coding. Only 18% of respondents report depending mostly on hand coding for DI. This seems low compared to other surveys TDWI has run. With this survey population, hand coding will drop down to a miniscule 1%. Migrating from hand coding to tool use as the primary development medium is, indeed, a prominent generational change for DI. Users are very interested in integrated suites of tools. Only 9% report using one today, yet 42% of respondents would prefer one. Integrated suites are available today from a few software vendors. This kind of suite typically has a strong DI and/or DQ tool at its heart, with additional tools for master data management, metadata management, stewardship, governance, changed data capture, replication, event processing, data services development, data profiling, data monitoring, and so on. As you can see, the list can be quite long, amounting to an impressive arsenal of related data management tools and tool features. As more user organizations coordinate diverse data management teams and their solutions, it makes sense for the consolidated team to use a single platform for easier collaboration. Coordinated teams of this sort generally want to share meta and master data, profiles, development templates, and other development artifacts. Thus, one of the noticeable generational trends in DI is the movement toward the use of integrated suites.

Which of the following best describes your organization’s portfolio of DI tools today? For your organization’s next generation DI implementation, how would you prefer that the DI portfolio be?

Today Would prefer

Using multiple DI tools from multiple vendors 44% 25% Using just one DI tool 22% 24% Mostly hand coded without much use of vendor DI tools 18% 1% Using a DI tool that’s part of an integrated suite of data 9% 42% management tools from one vendor Using multiple DI tools from one vendor 3% 6%

Other 4% 2%

Figure 2. Based on 323 respondents. Sorted by “today.”

DI tools and platforms from vendors tend to be feature-rich, especially when a single product Approximately 60% of DI supports multiple DI techniques. DI tools are like all enterprise software: Users employ the tool functions are functionality they need and ignore the rest—at least for the time being. Eventually, business and untouched today.

tdwi.org 9 Ne x t Generation Data Integration

technology requirements or resources change, and the DI team starts to employ functions they’ve previously ignored. For example, many users stick to core ETL functions for years before expanding their usage into functions that are tangential to ETL, such as changed data capture, services, and interoperability with buses. With the integrated data management suites discussed earlier in this report, users typically start with a particular tool type—usually for data integration or data quality— and later start using other tools built into the suite. TDWI suspects that users have tapped a relatively small percentage of their DI tools’ functions. To test this, a survey question asked: “What approximate percentage of your primary DI tool’s functions are you using?” The question demanded responses for today and for three years from now. See Figure 3. Survey responses show that, indeed, the percentage is rather low today, but will increase substantially in three years. For example, on the area graph, you can see that the largest concentration of users is employing between 30% and 50% of their DI tool’s functions today. In other words, the average DI shop is only using roughly 40% of tool functions, leaving the other 60% untouched. However, in three years, the largest concentration will be employing 50% to 80% of functions, for an average of approximately 65%.

What approximate percentage of your primary DI tool’s functions are you using?

Percentage of respondents—today Percentage of respondents—in three years

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Percentage of Tool Functions Used

Figure 3. Based on 323 respondents.

DI Tool and Platform Replacements One of the most extreme generational changes you can make for your DI solution is to rip out and replace its underlying tool or platform. As discussed later, the top reason for a replacement is the need for a unified platform that supports multiple tool types, including business-oriented functions (stewardship, exception processing). Other leading reasons are to get a DI platform that supports scalability and/or real-time functionality better than the current one does. Most organizations are Those sound reasonable. But how many users really need to replace their DI platforms now? According content with their current to the survey, the answer is that relatively few user organizations are considering such an extreme DI tool or platform. change. (See Figure 4.) One-third of respondents are planning a platform replacement in 2011 (19%)

10 TDWI rese arch Leading Generational Changes

or 2012 (14%). Yet, a whopping 62% report they have no plans to replace their DI platform. The conclusion is that most DI users are content with their current DI platform and tool portfolio.

When do you plan to replace your current primary data integration platform?

No plans to replace DI platform 62% 2011 19% 2012 14% 2013 2% 2014 1% 2015 1% 2016 0% 2017 or later 1%

Figure 4. Based on 323 respondents. It’s apparent that most users are satisfied with their current DI platform and see no need to replace it. Even so, it’s interesting to hear what kinds of problems would be so onerous as to drive a user to rip and replace. The question expressed in Figure 5 speaks to the heart of this matter by asking: “What problems will eventually drive you to replace your current primary data integration platform?” Responses reveal a few generational trends: Again, users are interested in integrated DI suites. At the top of the survey results in Figure 5, the User fascination with multiple-choice answer most often selected is: “We need a unified platform that supports DI, plus single-vendor, integrated data quality, governance, MDM, etc.” (40%). We also noted this interest in in Figure 2. Here, DI platforms recurs in respondents are going a step further to say that the demand for an integrated suite or platform would survey questions. be so strong as to drive them to a platform replacement. Note that this is a dramatic generational shift, given that multi-vendor best-of-breed approaches to data management software portfolio management have been the norm for many years. There’s a growing need for DI tool functions that business people can use. Note that 19% of respondents selected: “We need a platform with tools for some business users.” The growing inclusion of business people in the DI user community is a noteworthy trend. Some vendors are responding to this demand by supplying new, easy-to-use functionality for business-oriented tasks, such as stewardship, exceptions processing, and collaboration with a multi-functional team. This is yet another generational decision that planners of DI must consider. Scalability is naturally a concern for DI. Scalability problems can manifest themselves in different Scalability and real time ways, such as the cost of scaling up (37%) and inadequate data processing speed (35%). With any IT are DI’s most pressing system, frustrations over scalability can lead to a change of platform, and DI platforms are especially requirements. susceptible, due to increases in data volumes and processing complexity. Real-time and related capabilities are enabled or inhibited by a DI platform. A substantial 33% of respondents fear their DI platform may be poorly suited to real-time or on-demand workloads. They’re also concerned that the platform may suffer inadequate support for Web services and SOA (30%) or inadequate high availability (20%). These are all related, because users need services for real-time interfaces, and the interfaces aren’t real-time if the DI platform isn’t highly available. For

tdwi.org 11 Ne x t Generation Data Integration

many organizations, accelerating DI functions into real time is just as pressing a generational goal as scaling to massive data volumes. Legacy platforms or platform components can be a problem. A DI tool, like any IT system, can reach the end of its useful lifecycle. Apparently, a few survey respondents are at that stage, because they report that their “current platform is a legacy we must phase out” (18%). Legacy and related upgrade issues are also seen in responses to survey answers such as “current platform is 32-bit, and we need 64-bit” (12%) and “current platform is SMP, and we need MPP” (5%). Note that these upgrades are on the critical path to achieving generational goals in DI platform performance.

What problems will eventually drive you to replace your current primary data integration platform? (Select nine or fewer.)

We need a unified platform that supports DI, plus data quality, 40% governance, MDM, etc. Cost of scaling up is too expensive 37% Inadequate data processing speed 35% Poorly suited to real-time or on demand workloads 33% Inadequate support for Web services and SOA 30% Inadequate high availability 20% We need a platform with tools for some business users 19% Current platform is a legacy we must phase out 18% Can’t secure the data properly 18% We need a platform better suited to cloud or virtualization 15% Inadequate support for in-memory processing 14% Current platform is 32-bit, and we need 64-bit 12% Current vendor has questionable practices or viability 8% Current platform is SMP, and we need MPP 5% Other 5%

Figure 5. Based on 1,100 responses from 323 respondents (3.4 responses per respondent on average).

USER STORY A private cloud is a likely next generation platform for DI. “Our data integration server runs in a shared server environment, which uses a popular operating system for server resource virtualization,” said the lead data integration specialist at an insurance company. “My team was concerned when we moved to the private cloud provided by IT, because we’re used to owning the servers, plus having one each for data integration, reporting and analysis, and the . Not all software servers cohabitate and perform well under virtualized services, you know. But the data integration server I’m using does really well. IT recently upgraded the server farm controlled by virtualization, as part of our migration from legacy UNIX systems to LINUX. With greater server bandwidth, I’m now able to set up larger virtual machines for ETL jobs and other routines. Data warehouse loads that used to take 20 hours now complete in about two.”

12 TDWI rese arch Leading Generational Changes

Data Types Being Integrated The majority of data handled via DI tools and platforms today falls under the rubric structured data. Structured data is still This is primarily about the tables and other data structures of relational databases. But other sources the bread and butter of yield predictable structures, such as the record formats of most applications and the character- DI, but other data types delimited rows of many flat files. In our survey, a whopping 99% of respondents report handling are catching up. structured data today, and 78% will continue to do so in three years. See Figure 6. The hegemony of structured data types has been the norm in DI for decades, and that’s old news. The latest news is that DI solutions have begun handling a wider range of data types. In particular, 84% of respondents report handling some form of complex data today (hierarchical or legacy sources) with their DI tools. Almost as many respondents anticipate handling complex data in three years. Similarly, 62% reporting handling semi-structured data today (XML and similar standards), and this should grow to 87% in three years.2 Three data types are poised for explosive growth, namely event data (messages, usually in real time), Event, spatial, and spatial data (longitude and latitude coordinates, GPS output), and unstructured data (mostly text textual data types are expressing human language). All three will go from limited use today to over 90% use in three years. experiencing greater These and other non-structured data types are driven up by increased use of industry standards demand. (SWIFT, ACORD, HL7), smart devices (smart meters, RFID), digital content (images, video), social media (Twitter, Facebook), and many types of Web applications. Once again, the survey data of this report shows that more people than anticipated are handling events and their data through DI platforms. While that’s surprising, it’s not surprising that spatial data is on the rise. For years, TDWI has noted its Members adding tables and other structures to their data warehouses to fulfill new requirements for location data in support of asset management, actuarial assessments, delivery address augmentation, and consumer demographics. In fact, it’s a bit surprising that the handling of unstructured text is so low at present; TDWI has interviewed many of its Members who apply text mining or text analytic capabilities (whether built into their DI tool or supplied via a separate tool) to convert facts (discovered in textual documents) into structured data (typically a record or table row per discovered fact). For example, insurance companies regularly extract facts from text gathered in the claims process, then use that data to extend their analytic data sets for risk management and fraud detection.

For the types of data on the following list, which are you integrating today through your primary data integration implementation? Which do you anticipate using in three years or so?

Structured data (tables, records) 99% Using now 78% Using in 3 years Complex data (hierarchical or legacy sources) 84% 79% Semi-structured data (XML and similar standards) 62% 87% Event data (messages, usually in real time) 43% 93% Spatial data (long/lat coordinates, GPS output) 29% 95% Unstructured data (human language, audio, video) 21% 95%

Figure 6. Based on varying numbers of responses from 323 respondents. Sorted by “using now.”

2 For more information about how various data types are handled via data integration, see the TDWI Monograph Complex Data: A New tdwi.org 13 Challenge for Data Integration. Ne x t Generation Data Integration

USER STORY aDDRESSING complex data on its own terms can be generational. “Traditionally, our enterprise data warehouse—or EDW—housed mostly source data for highly detailed reports. In terms of ETL, that means a lot of E and L, but little T,” said an enterprise data architect at a manufacturing company. “As I work on our next generation of data integration, I’m focused on integrating core data, not just collecting it, as in the past. I’m developing numerous transformations that will yield aggregated, enterprisewide views of data, instead of the concatenated data marts we have now. The data product of my work goes into an enterprise data model our group has recently designed, in close collaboration with a wide range of other technical and business people. It’s still in review, but we feel confident that the logical model is an accurate view of how the business needs to be represented.

“To ensure that I populate the enterprise data model from the most appropriate data sources, master data has become a priority. Our primary data domain is products, and there are many definitions of product here. We believe we can reduce all these to a single, master definition. But it will be complex and hierarchical, so we’re investigating an XML-based representation of product data. The catch is that few data modeling tools support complex data types, like XML. Plus, we’ll have to move XML hierarchies into and out of our EDW, which is cast in third normal form. These challenges are worth overcoming, because we really need to handle complex data like XML, if we’re to design a master hierarchy that accurately represents the relations among products, parts, subassemblies, and bills of material.”

Data Integration Architecture DI demands To many people, the term data integration architecture sounds like an oxymoron. That’s because architecture, as any they don’t think that data integration has its own architecture. For example, a few data warehouse application type would. professionals still cling to the practices of the 1990s, when data integration was subsumed into the larger data warehouse architecture. Today, many data integration specialists still build one independent interface at a time—a poor practice that is inherently anti-architectural. A common misconception is that using a vendor product for data integration automatically assures architecture. Here’s the problem: If you don’t fully embrace the existence of data integration architecture, you can’t address how architecture affects data integration’s scalability, high availability, staffing, cost, and ability to support real-time operation, master data management, SOA, and interoperability with related integration and quality tools. All of these are worth addressing.3 To get a sense of generational trends in DI architecture, the survey asked which architectural types respondents are using today, in priority order. The survey also asked what they’d prefer. See Figure 7 (page 16). No consistent architecture. This is risky for any DI solution and the businesses that count on it. Without an architecture, there are few or no data standards, preferred interfaces, coding guidelines, or any other form of consistency. In turn, their absence works against reuse and performance tuning. Though 27% of respondents today lack a DI architecture, only 3% anticipate still being in this undesirable position in the future.

14 TDWI rese arch 3 For a detailed discussion of DI architectures, see TDWI’s What Works in Data Integration (Volume 25) feature article, “Data Integration Architecture: What It Does, Where It’s Going, and Why You Should Care.” Leading Generational Changes

Collections of point-to-point interfaces. Most point-to-point (P2P) interfaces are designed and built Hand-coded spaghetti is in a vacuum, without reference to standards. Most are hand coded. The colloquial name for this is not an architecture. “spaghetti coding.” Of course, you realize that P2P is not really an “architecture”—spaghetti is the antithesis of architecture! This is the last thing you want to inherent from other developers, because it’s nearly impossible to see relations among interfaces, much less the big picture of the DI solution. Lamentably, at 53%, P2P is the most common DI architecture today, and it’s the approach with the (current) highest first priority. Luckily, users surveyed anticipate cutting their dependence on P2P in half in the near future. Hub-and-spoke architecture. This has become the preferred architecture for most integration technologies today, including the form of data integration known as extract, transform, and load (ETL). (Variations of ETL—such as TEL and ELT—may or may not have a recognizable hub.) However, this is not true of ETL alone; for example, hubs are common in deployments of data federation. Replication usually entails direct interfaces between databases, without a hub, but high- end replication tools support a control server or other device that acts as a hub. Data staging areas and operational data stores (ODSs) often serve as hubs, which are then critical for customer data integration and MDM. Enterprise application integration (EAI) tools and their buses depend on message queue management, and the queue is usually supported by a central integration server (i.e., a hub) through which messages are managed. Hub-and-spoke rated well in our survey, and users surveyed anticipate applying this architecture in the future. Hub-and-spoke is popular, but that’s no reason to be doctrinaire about its application. At some point, most architectures evolve into some form of hybrid. Many successful DI implementations are mostly hub-and-spoke, but with a little bit of spaghetti thrown in. A common best practice in DI is to replace a spoke with a point-to-point interface when the spoke doesn’t scale or perform. Sometimes performance and scalability take precedence over architecture. Data service architecture. Data integration architecture is heading out on the leading edge by Services and buses incorporating service-oriented architecture (SOA). Note that SOA won’t replace current hub-based will reinvigorate DI architectures for data integration. Hubs will remain, but be extended by services. The goal is to architecture. provide the ultimate spoke, namely the data integration service or simply data service. According to the survey, this type of DI architecture is set to grow the most, nearly doubling from 41% of respondents using it today to 73% in users’ next generation DI solutions. Buses for messages, events, and services. Similar to services, the use of buses with DI solutions is set to grow significantly (from 23% to 56%). Note that services and buses are related. Most services (regardless of type) are transported over a bus, as are responses to services. In addition, recall that survey questions discussed earlier in this report show that event processing is a new but growing technique for DI. It, too, may depend on an enterprise bus for event delivery and reaction. As the need for data services and event processing grow for next generation data integration solutions, so will the need for DI tools and platforms to access enterprise buses.

tdwi.org 15 Ne x t Generation Data Integration

Which of the following approaches to DI architecture are you using in your data integration infrastructure today? (Click one or more of the following answers, in priority order from used most to used least.) For your next generation data integration infrastructure, which DI architectures would you prefer to be using, in priority order?

KEY: Priority Order

TODAY NEXT GEN TODAY NEXT GEN TODAY NEXT GEN TODAY NEXT GEN TODAY NEXT GEN Fourth 1% Third 70% 8% Second

First 60% 2% 2% 2% 24% 50% 3% 11% 15%

40% 16% 1% 2%

15% 6% 8% 30% 5% 12% 2% 24% 4% 15% 7% 2% 20% 40% 8% 32% 6% 28% 27% 10% 21% 16% 10% 15% 12% 3% 3% 0% Types of DI Collection of Hub-and- No Data service Bus for Architectures point-to-point spoke consistent architecture messages, interfaces architecture architecture events, and services

Figure 7. Based on 323 respondents. Sorted by “today” and first priority. Note that values for fifth and sixth priorities (all 1% or 0%) are omitted to simplify the chart.

16 TDWI rese arch Organizational Issues

USER STORY On the leading edge: Data integration as infrastructure. “When I came into my current position, I immediately saw a need for data integration as a shared enterprise infrastructure. It would be analogous to a local area network that’s accessible to just about anyone, with ample bandwidth for everyone,” said the lead data integration specialist at a pharmaceutical company. “A generous site license from a leading data integration vendor was key to making this feasible. Today, use of the platform is free to any group, without much review of their purposes. Due to the large size of the company and the honest need for data integration, we’ve spawned over 400 implementations, supported by over 400 data integration developers worldwide.

“The site license isn’t cheap, but the business feels it’s worth the expense. Pharma companies tend to suffer dozens of siloed business units, each focused on a different pharma product. Data integration as a shared enterprise infrastructure has greatly accelerated the sharing of data across these units, which results in desirable knowledge transfers and more accurate reporting across the entire enterprise.”

Organizational Issues for NGDI

Organizational Structures for DI Teams Corporations and other user organizations have hired more in-house data integration specialists in response to an increase in the amount of data warehousing work and operational data integration work outside of warehousing. In the “old days,” an organization had one or maybe two data integration specialists in house (if any), whereas a dozen or more are common today. To quantify the size of DI teams today, the report survey asked: “How many full-time data The average number integration specialists work in your organization?” See Figure 8. The survey required respondents of DI specialists per to type an integer between zero and 99. A simple average of the entries tallies to 16.4 DI specialists organization is in the per organization, on average. Admittedly, this number is a bit skewed, because a few respondents range of 13.1 to 16.4. reported having zero (7%) or 99 (5%). Treating these as outliers and omitting them brings the average down to 13.1. Either way, these numbers indicate rather sizable DI teams. To give this growth a context, let’s compare surveys. In a TDWI Technology Survey from May 2007, one-quarter of surveyed organizations reported having five or more DI specialists. In this report’s survey, roughly half of respondents fit that bill. By that standard, the number of DI specialists has doubled in the last four years. As another data point, the number of DI specialists filling out the TDWI Salary Survey has almost doubled in the same time frame—and their salaries have increased substantially!4

4 The total annual compensation of the average DI specialist finally broke $100,000 in 2010. For details, see the 2011 TDWI Salary, Roles, tdwi.org 17 and Responsibilities Report (available to TDWI Members on tdwi.org). Ne x t Generation Data Integration

How many full-time data integration specialists work in your organization? Number of Full-Time DI Specialists 51 to 98 Zero 1 to 5 6 to 10 11 to 15 16 to 25 26 to 50 99+

7% 39% 19% 8% 9% 10% 2% 5%

Percentage of Respondents

Figure 8. Based on 323 responses. Most DI specialists work As the number of DI specialists grows and the breadth of their work expands over analytic and in a BI/DW team. The operational tasks, organizations are driven to reevaluate how and where they manage DI specialists rest are strewn across and their work. Today, a number of team structures organize the work of data integration specialists, other enterprise teams. as seen in Figure 9: BI/DW team. In many organizations, the bulk of DI still centers on data warehousing (DW) and business intelligence (BI), so it makes sense to manage DI work through the BI/DW team (59%). Data architecture and administration. DI doesn’t just originate from BI/DW teams. Another common starting point is the database administration group (DBA, 15%). A common reorganization nowadays is for the DBA group to be subsumed into an enterprise data architecture group (EDA, 24%). This makes sense, because a lot of information lifecycle management work that EDA groups initiate involves operational DI to migrate, consolidate, sync, and upgrade operational databases. DI managed by IT. One of the newer trends in DI is to treat DI platforms and solutions like shared infrastructure, akin to how networks and storage are managed centrally and openly made accessible to many enterprise organizations and their IT systems. In these cases, central IT management (25%) or the CIO’s office (12%) manages DI specialists and their work. Recent years have DI-specific teams. For many firms, a next generation priority is to find an appropriate home for DI seen the birth of the specialists, as well as their tools, platforms, and solutions. A conclusion more and more organizations DI-specific team, often are coming to is that there should be an independent data integration team as a standalone unit in a competency center. (23%). The standalone unit often takes the shape of a data integration competency center (17%), although DI may also be folded into other forms of competency centers that are not exclusive to DI (12%). Among TDWI Members, the BI competency center is a common example. In all these cases, the competency center (sometimes called a center of excellence) provides shared human resources (namely, DI specialists) who can be allocated by the center’s manager to DI work as it arises, whether it’s analytic, operational, or both.

18 TDWI rese arch Organizational Issues

Data stewardship and governance. Stewardship and governance are, themselves, evolving into a new generation. Both originated as advisory boards, where committee members identify data quality or data compliance problems and opportunities, then recommend that data experts in other groups take action to correct or leverage them. In the next generation, expect to see more data management professionals—especially DI and DQ specialists—reporting to a data governance board (5%) or data stewardship program (5%), so they can do the technical work that the board identifies as a priority.

Where you work, what kind of organizational structure coordinates the work of most data integration specialists? Select all that apply.

Data warehouse or business intelligence team 59%

Central IT management 25% Enterprise data architecture group 24% Data integration team—as a standalone unit 23% Data integration competency center 17% Database administration group 15%

CIO’s office 12% Competency center—not exclusive to DI 12%

Data governance board 5% Data stewardship program 5%

Other 5%

Figure 9. Based on 652 responses from 323 respondents (2 responses per respondent on average).

USER STORY Competency centers and other central teams offer advantages. “My employer is a large, multi-billion-dollar company that has grown mostly through mergers and acquisitions,” said Ron Woodyard, the primary integration manager at Cardinal Health. “This helps explain why we have so many data integration specialists and so much work to do. To handle it, we’ve brought close to 250 employees into our Integration Services Center (ISC). Around 140 members of the ISC constitute the pure integration team, while the other folks work on MDM, content management, EDI services, and so on. I run the ISC like a business, based on shared human resources and technology services.

“Centralizing data integration and similar work in the ISC has its advantages. Having most of the eggs in one basket makes it easier to align our work with the firm’s information agenda. With all data integration processing flowing through one center, planning capacity is more accurate, as opposed to quantifying many tools in many business units on many platforms. Having development standards and enforcing them through a code review process is a lot smoother. We can now source data once, then distribute it multiple times. And the ISC has saved money by replacing hand coding with vendor tools as the primary development method.”

tdwi.org 19 Ne x t Generation Data Integration

Unified Data Management In most organizations today, data and other information are managed in isolated silos by independent teams using various data management tools for data quality, data integration, data governance and stewardship, metadata and master data management, B2B data exchange, database administration and architecture, information lifecycle management, and so on. In response to this situation, some organizations are adopting what TDWI calls unified data management (UDM), a practice that holistically coordinates teams and integrates tools. TDWI Research defines unified data management (UDM) as a best practice for coordinating diverse data management disciplines, so that data is managed according to enterprisewide goals that promote technical efficiencies and support strategic, data-oriented business goals. A good enterprise The “big picture” that results from bringing diverse data disciplines together through UDM yields data management several benefits, such as cross-system data standards, cross-tool architectures, cross-team design strategy will demand and development synergies, leveraging data as an organizational asset, and assuring data’s integrity coordination among and lineage as it travels across multiple organizations and technology platforms. However, the data disciplines. ultimate goal of UDM is to achieve strategic, data-driven business objectives, such as fully informed operational excellence and business intelligence, plus related goals in governance, compliance, business transformation, and business integration.5 Data integration is but one of the many data management disciplines that may be coordinated via UDM and similar organizational practices. Yet, the need for UDM affects DI, in that DI specialists and their managers must revisit when and how certain DI work should be coordinated with related work by other data management teams. The priority and importance of such collaboration by DI specialists varies from one data management team to the next. These priorities are sorted in Figure 10. BI and DW. DI specialists have their priorities straight. Coordinating with BI/DW teams is both the greatest first priority and the greatest second priority. As pointed out earlier, TDWI’s survey populations tend to have a strong representation of DW and BI professionals. Even if we pare back the survey results to compensate for the survey population, the DI specialist’s commitment to BI/ DW coordination is still clear. Application integration and SOA. As we’ve seen in other data points of this report, DI specialists are continuing the trend of integrating some data (usually time sensitive) over application buses. In another trend, they’re embracing data services and the concept of data virtualization. Both trends require more coordination between the DI specialist and application integration teams. These trends have progressed to the point that this coordination is now a high priority. Data architecture and modeling. There’s a long-standing tradition in which DI specialists get a lot (if not all) of the requirements they need to design and build a solution from a data architect or modeler. This is the case for most DI specialists working in a traditional DW team. More and more DI specialists work on database architecture and administration teams, where they get much of their direction from an enterprise data architect or similar team leader. (In some organizations, the data architect is called a data analyst.) As the next generation takes DI specialists off to independent teams, this coordination will most likely continue, but without the DI person reporting directly to an architect.

20 TDWI rese arch 5 For a detailed discussion of unified data management (UDM), see the TDWI Best Practices Report Unified Data Management: A Collaboration of Data Disciplines and Business Strategies. Organizational Issues

Governance, stewardship, and quality. In one trend, DI specialists are getting involved as committee Besides risk and members for data governance and stewardship. In another trend, DI specialists coordinate compliance, good their efforts ever deeper with DQ specialists. Put these together, and you can expect increased data governance also coordination between DI specialists and teams or boards for governance, stewardship, and quality. provides a medium for coordinating data Meta and master data. According to TDWI surveys, most implementations of master data management work. management (MDM) are home-grown, built atop a data integration tool (usually in the ETL style). All data management professionals have to do a fair amount of metadata management in the due course of their work. Secondary, supporting data management disciplines. Data integration is a primary data management discipline in that it generates a deliverable, similar to other primary disciplines such as data quality and MDM. DI and the other primary disciplines demand a fair amount of coordination with secondary, supporting disciplines such as metadata management and data profiling. Data archiving. The use of DI tools in data archiving has come out of nowhere in recent years to become a sizable presence. That’s because enterprises are struggling to manage the giant volumes of data they’ve amassed. To reduce the burden of less valuable or older data on primary storage systems, they’re aggressively moving data into archives. Doing that with efficiency and sophistication requires DI tools and techniques. Suddenly, data archiving is part of the DI workload, and will increase in the next generation.

With which other data management practices or teams do you coordinate DI work? Priority Order First Second Third Fourth Fifth Sixth

Business intelligence and data warehousing Application integration and SOA Enterprise data architecture Data modeling Data governance Data quality Master data management Data archiving Metadata management Data stewardship Content management, including text analytics Inter-enterprise (or B2B) data exchange Data pro ling

0% 10% 20% 30% 40% 50% 60% 70% 80% Percentage of Respondents

Figure 10. Based on 323 respondents. Sorted by first priority. Note that seventh through thirteenth priorities (all 2% or less) are omitted to simplify the chart.

tdwi.org 21 Ne x t Generation Data Integration

USER STORY Selecting a platform is a key generational decision. “I spearhead my firm’s data management initiative, which involves the coordination of teams and solutions for data integration, data quality, MDM, warehouse design, and business intelligence,” said James Brousseau, the enterprise data architect at SonoSite, Inc. “Early on, we decided that coordinating this many tool types and disciplines would be easier and yield more sustainable results if we standardized on a platform that supports as many of these disciplines as possible. We also knew that data quality and data integration would be immediate needs. So we acquired a vendor platform that excels in both quality and integration, plus has other tools. To be sure we’d get the enterprisewide coordination we need, we made the platform an enterprise resource, owned and maintained by central IT, but accessible to various teams on an as-needed basis.

“With this foundation successfully deployed, we can now focus on next generation goals. Most of these revolve around transforming our data warehouse. Today, it’s mostly an operational data store for ERP reporting. We’ll keep that, plus evolve the warehouse into an enterprise-scope view of corporate performance that’s more appropriate to business intelligence. After that, the next priority will be to develop a gold copy of addresses and other customer data.”

Collaborative Data Integration Collaboration reaches The need for collaboration around data integration has increased recently. On the technology side, within newly expanded data integration specialists are growing in number, data integration work is increasingly dispersed DI teams, plus across geographically, and data integration is more tightly coordinated with other data management to related teams and practices (especially data quality and MDM). On the business side, business people have long taken business management. an interest in data integration related to business intelligence and mergers, but they now need direct involvement due to new requirements for compliance and governance. TDWI Research defines collaborative data integration as a collection of user best practices and tool functions that foster collaboration among the increasing number of technical and business people who are involved in data integration projects and initiatives.6 The leading business benefits of collaborative data integration are that it supports governance and gives business people self-service visibility into the details and progress of data integration projects. Technology benefits include more efficient and effective collaboration between the business and IT, the reuse of development objects, and more options for IT management to manage geographically dispersed teams. Despite its benefits, there are barriers to collaborative data integration: The point of new DI tool DI in terms business people can understand. Business and technical people speak different languages, functions for business according to 60% of respondents in Figure 11. The problem is exacerbated because most DI tool users is to let them implementations today lack a business-friendly view of data (52%). To alleviate this problem, some collaborate over DI. organizations create a semantic layer or data virtualization layer with a DI tool, using its metadata management, data services, and related capabilities. DI tools for business people. According to survey respondents, their current tools lack functions for business people to use (41%). As explained earlier, a number of vendor tools now include business- oriented functions for data governance, stewardship, exception processing, business views of data, requirements and specifications, and annotations for metadata and data profiles. Collaborative tool features. A number of respondents complained that their current tools lack adequate version control (20%). DI tools need the kind of collaborative functions that have been

22 TDWI rese arch 6 For in-depth discussions of collaborative DI, see the two TDWI Monographs Collaborative Data Integration: Coordinating Efforts within Teams and Beyond and Second-Generation Collaborative Data Integration. Practices, Tools, and Platforms

common in application development tools for years. For example, check in/out and versioning for routines, data flows, and other DI development artifacts are absolute requirements. Optional features include project management, project progress reports, object annotations, and discussion threads. Most collaborative functions should be accessible via a browser, so a wide range of people (regardless of location) can collaborate.

What are some barriers to collaboration for DI in your organization? Select all that apply. Collaboration is not an issue for us. 17% (If you check this, do not check other answers.)

Business and technical people speak different languages 60% Lack of a business-friendly view of data 52% Our current tools lack functions for business people to use 41% Our current tools lack adequate version control 20% Other 5%

Figure 11. Based on 632 responses from 323 respondents (2 responses per respondent on average).

Catalog of NGDI Practices, Tools, and Platforms At this point in the report, we’ve defined the terms and concepts of next generation data integration (NGDI), listed the drivers that push organizations into a new generation, and discussed common generational changes. As you have likely noticed, the next generation of data integration involves many different options, which include tool features and tool types, user-oriented techniques and methods, and team or organizational structures. Now it’s time to draw the big picture so we can answer questions about these options, such as: • What are the many options that users need to incorporate into the next generation of their data integration solutions? • Which ones are users adopting and growing the most? • Which are in decline? • At what rate is generational change occurring? To help quantify these and other questions, TDWI presented survey respondents with a long list The options available for of options for data integration. (See the left side of Figure 12, page 25.) These options include a DI today are diverse in mix of vendor-oriented product features and product types, as well as user-oriented techniques and type and maturity. organizational structures. The list includes options that have arrived fairly recently (real-time functions, complex event processing), have been around for a few years but are just now experiencing broad adoption (changed data capture, high availability for DI servers, services), or have been around for years and are firmly established (ETL, hand coding, batch processing). The list is a catalog of available options for DI, and survey responses enable us to sort and interpret the list in a variety of ways. Concerning the list of DI options presented in the survey, TDWI asked: “For the techniques, features, Survey responses and practices on the following list, which are you using today in or around your primary data enable us to predict the integration implementation?” To get a sense of how this will change over time, TDWI also asked: level of increased usage “Which do you anticipate using in three years or so?” Survey responses for these two questions are for a DI option.

tdwi.org 23 Ne x t Generation Data Integration

charted as pairs of bars on the left side of Figure 12. The “potential growth” chart in the middle of Figure 12 simply shows the per-row delta between responses for “using now” and “using in 3 years,” to provide an indication of how much the usage of a DI option will increase or decrease. Survey responses The survey question told the respondents: “Checking nothing on a row means you have no plans also show which DI for using that technique now or in the future.” This enables us to quantify the approximate percent options are relatively of user organizations surveyed that are using a particular DI option, whether now, in the future, or common today. both. The cumulative usage measured here is a sign of how committed to a particular DI option users are, on average. These percentages are charted in the “commitment” column of Figure 12.

Potential Growth versus Commitment for DI Options Figure 12 is fairly complex, so let’s explain how to read it. First off, Figure 12 is sorted by the “potential growth” column in descending order. “Master data management (MDM)” appears at the top of the chart, because—with a delta of 45%—this option has the greatest potential growth. However, not all organizations plan to use this option. In the commitment column, we see that 72% of survey respondents have committed to implement MDM at some point. Apparently, 28% of respondents have no plans to implement MDM. By scanning the commitment column in Figure 12, you can see that 72% is a very high level of commitment for a DI option. Coupled with the very high potential growth, it’s obvious that, in the wide majority of organizations, the next generation of DI will include some form of MDM. Commitment and From this, we see that there are two forces at work in Figure 12, as well as in the planning processes potential growth are two of user organizations. different metrics for the • Potential growth. The potential growth chart subtracts “using now” from “using in 3 years,” and future of DI options. the delta provides a rough indicator for the growth or decline in use of DI options over the next three years. The charted numbers are positive or negative. Note that a negative number indicates that the use of an option may decline or remain flat instead of grow. A positive number indicates growth, and that growth can be good or strong. • Commitment. Collected during the survey process, the numbers in the commitment column represent the number of survey respondents who selected “using now” and/or “using in 3 years.” However, that number is expressed as a percentage of 323, which is the total number of respondents who answered the questions in Figure 12. Note that the measure of commitment is cumulative, in that the commitment may be realized today, sometime in the near future, or both. • Balance of commitment and potential growth. To get a complete picture, it’s important to look at the metrics for both growth and commitment. For example, some features or techniques may have significant growth rates, but within a weakly committed segment of the user community (clouds, open source DI, Saas). Or, they could have low growth rates, but be strongly committed through common use today (ETL, batch processing). Options seeing the greatest activity in the near future will most like be those with strong ratings for both growth and commitment (MDM, data governance, data quality). To help you visualize the balance of growth and commitment, Figure 13 includes the potential growth and commitment numbers from Figure 12 as opposing axes of a single chart. DI options are plotted in terms of growing or declining usage (x-axis) and narrow or broad commitment (y-axis).

24 TDWI rese arch Practices, Tools, and Platforms

For the techniques, features, and practices on the following list, which are you using today in or around your primary data integration implementation? Which do you anticipate using in three years or so? (Answer these two questions for each row in the following table. Checking nothing on a row means you have no plans for using that technique now or in the future.)

USING IN 3 YEARS USING NOW POTENTIAL GROWTH COMMITMENT Master data management (MDM) 69% 24% 45% 72% Real-time data quality 50% 7% 42% 52% Real-time data integration 56% 16% 40% 60% Data governance and stewardship 69% 29% 40% 74% Complex event processing (CEP) 46% 12% 34% 49% Tool functions for business people 41% 9% 32% 42% Metadata management 67% 37% 29% 72% Text analytics or text mining 36% 7% 29% 38% Real-time alerts 44% 15% 29% 46% In-memory processing without landing data to disk 41% 14% 27% 44% Data quality functions 73% 47% 26% 84% Data profiling 68% 42% 26% 76% Data federation and virtualization 47% 22% 25% 52% Web services 51% 28% 24% 56% Service-oriented architecture (SOA) 47% 25% 23% 52% Interoperability with message bus or service bus 35% 14% 20% 37% Single integrated platform for DI, DQ, MDM, etc. 31% 10% 20% 34% Changed data capture (CDC) 67% 47% 20% 76% High availability (HA) for DI server 35% 16% 19% 38% Private cloud as a DI platform 23% 4% 19% 24% Cross-team collaborative functions 40% 21% 19% 46% Trickle or streaming data loads 25% 7% 18% 27% Metadata repository used for non-metadata 28% 11% 17% 31% Micro batches during business day 32% 17% 15% 36% XML as source data or message type 53% 40% 13% 59% DI tool licensed via open source 20% 8% 12% 23% Hadoop-based data processing 15% 3% 12% 16% DI tool licensed via software-as-a-service (SaaS) 17% 7% 10% 20% Data synchronization 55% 45% 10% 65% Public cloud as a DI platform 11% 2% 10% 12% Inter-enterprise or B2B data exchange 22% 14% 9% 25% Java messaging service (JMS) 20% 14% 6% 25% Secondary DI tool to clear specific bottlenecks 11% 6% 5% 13% Sort tool, to augment main DI tool 8% 5% 3% 10% Replication 31% 29% 2% 39% Extract, load and transform (ELT) 51% 49% 2% 61% Extract, transform and load (ETL) 68% 80% -11% 85% Batch processing 67% 91% -24% 92% Hand-coded DI routines 22% 49% -27% 51%

Figure 12. Based on 323 respondents. The above charts are sorted by “potential growth.”

tdwi.org 25 Ne x t Generation Data Integration

Next Generation DI Options Plotted for Growth and Commitment

1. STRONG-TO-MODERATE 5. STRONG COMMITMENT, COMMITMENT, STRONG 100% DECLINING GROWTH POTENTIAL GROWTH Batch processing 2. GOOD COMMITMENT, GOOD POTENTIAL GROWTH Data Extract, quality Strong transform, and load (ETL) Data Change data pro ling

75% capture (CDC) Data governance Metadata Master data management management (MDM) Data Good Real-time sync Web Extract, load, XML services DI and transform (ELT) SOA Real-time Hand-coded routines Data federation DQ 50% Complex event Collaborative DI processing (CEP) Service bus

COMMITMENT Real-time alerts Replication HA In-memory DI Intraday microbatches Tools for business folk 3. MODERATE COMMITMENT,

Moderate Uni ed DI platform Text analytics GOOD POTENTIAL GROWTH Repository for non-metadata JMS B2B DI Trickle feed

25% Private cloud for DI Open source DI SaaS DI 4. WEAK COMMITMENT, Secondary Hadoop Weak GOOD POTENTIAL GROWTH DI tool Public cloud for DI Sort

0% tool -50% Declining -25% Flat 0% Good +25% Strong +50%

GROWTH

Figure 13. Plots are approximate, based on values from Figure 12.

Trends for Next Generation Data Integration Options Rates of growth and Figures 12 and 13 show that most DI options will experience some level of growth in the near commitment identify future. The figures also indicate which options will grow the most, and they reveal a number of six groups of next trends concerning how users plan to apply various options to their next generation data integration generation options. solutions. In particular, six groups of options stand out based on combinations of growth and commitment. (See the groups circled, numbered, and labeled in Figure 13.) Options capable of 1. Strong to moderate commitment, strong potential growth. The options most likely to live up to real-time operation are our great expectations and sustain growth over the long haul are those that have solid survey numerous, and they are results for both commitment and potential growth. Group 1 in Figure 13 has those numbers, seeing strong growth. and it includes some of the most hotly pursued features and techniques of recent years. In many ways, group 1 is the epitome of next generation data integration because of its mix of leading-edge options supported by real-world organizational commitment.

Group 1 is a mix of growing real-time techniques, data management disciples, and organizational practices. The real-time techniques include real-time data integration, real-time data quality, complex event processing (CEP), and real-time alerts. Real-time techniques appear prominently in other groups in Figure 13, reminding us that the gradual migration of DI solutions toward real-time operation is possibly the strongest trend in DI today. Among these, CEP is a relatively new addition to the DI inventory

26 TDWI rese arch Practices, Tools, and Platforms

of options; it has come on strong, and TDWI expects CEP to become common in DI contexts in upcoming years. The data management disciplines of group 1 include MDM, data quality, data profiling, metadata management, and text analytics. Organizational practices include data governance and business people’s new hands-on involvement using DI tools. 2. Good commitment, good growth. As with group 1, features and techniques seen in group 2 have real-time data movement in common, ranging from changed data capture (CDC) to Web services and SOA to data federation and data sync. Group 2 also includes ELT (which has replaced ETL in many user solutions and vendor tools) and XML (which is quickly becoming a common data type for DI thanks to its use in B2B data exchange and other operational DI practices). 3. Moderate commitment, good potential growth. This group is an eclectic collection of DI Repositories manage options. Again, there are options that can move data in real time or close to it, as with more than metadata, trickle feeds, intraday microbatches, and message/service buses. High availability has become and they enable new a priority because real-time DI isn’t real-time if it’s not highly available. Group 3 also collaborative options. includes collaborative DI, an organizational practice that has skyrocketed in recent years to coordinate work among burgeoning numbers of DI specialists. In a related practice, users often enable collaboration via shared project documents and development artifacts managed in a metadata repository; as you can see, such repositories now manage much more than metadata, handling master and reference data, browser views of data, discussion threads, object annotations, and a wide range of productivity documents. 4. Weak commitment, good growth. It’s interesting that this category includes some of the newest options for data integration, including software as a service (SaaS), public and private clouds, and open source software for DI and related data management disciplines. The appearance of Hadoop in this group (plus text analytics in group 1) reminds us that DI solutions are progressively embracing the integration of unstructured data, especially in the form of natural language text. These options are so new to data integration that they have only a minimal commitment so far, but they should see good growth soon. Group 4 also includes the use of sort tools and secondary DI tools to augment primary ones. TDWI has seen organizations clear performance bottlenecks with such tools. In a distributed DI architecture, these extra tools help to offload processing workloads from over-taxed DI servers at the hub of the architecture. 5. Strong commitment, declining growth. This group includes three of the great pillars of Older DI options won’t traditional data integration, namely: extract, transform, and load (ETL), batch processing, and go away, but will be a hand-coded routines. In fact, these are some of the most common components found in data lesser percentage of integration solutions deployed today. If these are so popular, then why does the survey show DI functions as they’re them in decline? joined by new ones. Think of the many new real-time capabilities that users are employing in DI, plus the strong trend toward data services. Batch processing will never go away, because it’s still very useful. Yet, it’s being used less and being replaced in a growing number of use cases with other speeds and frequencies for processing and information delivery. Likewise, hand coding is being progressively supplanted by solutions built primarily atop a vendor DI tool, as described earlier. Hand coding won’t go away, either, because it’s indispensible for custom work that complements vendor tool capabilities. Long story short, batch processing and hand coding are becoming a smaller percentage of the options applied to DI, as more of the newer options become prominent.

tdwi.org 27 Ne x t Generation Data Integration

ETL is a similar case. A common knee-jerk reaction to ETL is that it’s only for overnight batch processing, with heavy transformational processing in support of data warehousing. That might have been true in the early 1990s, but today’s ETL tools support most of the options listed in Figures 12 and 13. Ironically, as users progressively tap into more of these new functions, they usually don’t think of them as ETL, even when the functionality is available directly from an ETL tool or a DI platform with an ETL lineage. Similar to batch processing and hand coding, ETL is not going away. It’s just contracting as a percentage of DI capabilities as new options join it.

USER STORY An alternative view of data integration. “First, I’m not a fan of ETL, so I’m looking for a solution that will replace it,” said a data architect and solution architect at a large bank in the United States. “It’s ironic that ETL specialists are hardened technology guys, yet they’re supposed to satisfy business requirements. I need a solution that gives business users control over metadata, instead of the ETLers. That way, sales can view data one way this week, another way next week. Second, if I can’t replace ETL, then I’ll at least improve it by moving from a time-consuming waterfall development method to an agile one. Third, data integration should just expose data to mathematicians and statisticians for analytic purposes. The deliverable is mostly transactional data, with little or no transformation. Hence, there’s no real need for ETL in my department.”

Vendor Products and Platforms for NGDI Since the firms that sponsored this report are all good examples of software vendors that offer tools, platforms, and services conducive to the next generation of DI, let’s take a brief look at the product portfolio of each, with a focus on next generation trends and requirements. The sponsors form a representative sample of the vendor community. Yet their DI offerings illustrate different approaches to DI tools and platforms.7 DataFlux From a vendor’s viewpoint, one of the most challenging next-generation requirements to satisfy is the demand for data management tools that are appropriate to business people. For years, DataFlux has offered a mature DQ suite, and more recently they’ve built out its stewardship functions to evolve them toward data governance, exceptions processing, management dashboards for quality metrics, business-friendly views of data, and other needs specific to business users. DataFlux is a subsidiary of SAS, and a few years ago the two executed a reorganization that moved SAS’s DI products to DataFlux. This has helped them deepen the integration between DQ and DI tools. These tools, of course, also have tight integration with SAS’s DW, BI, and analytic tools. All of these together comprise a broad and deep portfolio of data management tools. IBM For many user organizations, DI’s next generation is about tapping more functions outside basic DI ones, which often requires acquiring more tool types. In response to this demand, the IBM Software Group provides a comprehensive portfolio of integrated products and capabilities for a variety of use cases. The IBM InfoSphere Information Server platform has common metadata services and integrated user-centric tooling designed to promote enterprisewide collaboration between lines of business and IT. The platform also supports automated integration of best practices, reference architectures, and control for reducing risk for future projects. Integrated capabilities include DI, DQ, CDC, replication, data federation, and many other data management disciplines. Multiple approaches to MDM are supported through the IBM Master Data Management Server. Data modeling and process tools are available through IBM’s Rational Software product line. IBM

28 TDWI rese arch 7 The vendors and products mentioned here are representative, and the list is not intended to be comprehensive. Vendor Products and Platforms

has taken a leadership position in the big data and analytics domain with the introduction of InfoSphere Streams and Hadoop-based InfoSphere BIG Insights. TDWI survey data reveals that most users would prefer to acquire as many DI and related tools Informatica as possible from a single vendor—but only if the tools are fully integrated. Toward this end, Informatica has built up a broad portfolio encompassing DI, DQ, MDM, profiling, stewardship, data services, changed data capture, unstructured data processing, B2B data exchange, cloud data integration, information lifecycle management, CEP, and messaging. But Informatica has gone the extra mile by assuring a deep level of integration across development environments, expanded data analyst and steward capabilities, and interoperability among deployed servers. In recent years, Informatica has shown thought leadership with a number of next-generation DI issues by helping define and make practical DI competency centers, data services, cloud-based DI, business self service, and lean DI development methods. Coordinating DI with other data management disciplines is a priority for next generation DI. SAP SAP enables this goal by providing a comprehensive suite of enterprise information management (EIM) tools that are integrated. Furthermore, SAP has extended this priority by providing tight ties among its multiple portfolios of applications for data management, operational applications, and business intelligence. The EIM portfolio includes a mature, integrated solution for DI, DQ, text analytics, data profiling, and metadata management. There are also tools for several next generation hot spots such as CEP, text analytics, CDC, and MDM. The recent acquisition of Sybase adds Sybase IQ (a columnar analytic database) and Sybase Replication Server (for high-end replication and synchronization). To serve the business user who needs to actively support data management work, the new SAP Business Objects Information Steward pulls together a business user interface for profiling, metadata, data definitions, and DQ rules. Scalability and speed are near the top of the priority list for next generation data integration Syncsort solutions, and Syncort has long served organizations that have a pressing need to accelerate their data integration environments. Well-known for its high-speed mainframe sorting product (Syncsort MFX), Syncsort Incorporated offers a sophisticated portfolio of high-performance data integration solutions for open systems running on commodity hardware (Syncsort DMExpress) and data protection (Syncsort BEX). These can be deployed as standalone implementations. But DMExpress is often deployed to extend the data performance capabilities of existing DI environments—or independent software vendor (ISV) applications—to clear their performance and scalability bottlenecks. DMExpress is known for its efficiency, easy learning curve, flexible deployment options, and ability to integrate with other DI and data management tools to deliver extremely high performance at scale. The Talend Unified Platform is in tune with a number of generational trends in data integration. Talend Many users surveyed are interested in a unified platform, and Talend’s platform includes tools for DI, DQ, MDM, and data profiling. All four tools are built atop a shared platform with a unified metadata repository, only one metadata and administration server to deploy, and a common development GUI integrated into Eclipse. Talend has recently acquired application integration vendor Sopera, whose tool will soon be integrated into the platform. Another generational trend is to use a single tool or platform for analytic DI, operational DI, and other use cases. Talend has a reputation for serving multiple DI user constituencies. Finally, some users are also looking for cost- effective data management tools, and Talend’s open source tools are available at a modest price.

tdwi.org 29 Ne x t Generation Data Integration

Recommendations Redefine DI for yourself Modernize your definition of data integration. DI has evolved so much in recent years that even data and your peers. integration specialists find it hard to keep up with the changes. Avoid outmoded mindsets that banish data integration to a dark corner of data warehousing or database administration. You’ll never grasp the next generation of data integration if you can’t see its newly acquired diversity. Help your colleagues understand that DI is a family of techniques. It’s not just ETL or a DBA utility. The list of techniques is already long, and it will get longer. Note that DI practices reach across analytic and operational boundaries. This affects everything, from staffing and funding to tool selection and solution designs to development standards and architecture. Plan the next generation accordingly. Collaborate and Get out more often. DI has a new requirement for collaboration. You’re not doing the job fully unless coordinate to truly you’re involved in stewardship and governance. Assume you should coordinate your work with that of know and satisfy DI other data management disciplines, especially data quality and master data management. solution requirements. Think of stewardship and governance as data management disciplines. They aren’t per se, but they might as well be, because these collaboration and control groups have tremendous influence on next generation data management. Most likely changes: Create a home for wayward DI specialists. As the number of DI specialists and the diversity of DI work DI team structure and increases, expect to re-org the DI team. Most organizations continue to be successful with DI sourced DI architecture. from teams for data warehousing and database administration, but there’s a trend toward independent DI teams, sometimes organized as a competency center. Admit that DI needs an architecture. If you don’t have one, get one. Architecture can enable or inhibit critical next generation functions such as real time, scalability, and services. Tools assume certain architectures, but you still have to design your own. Besides, no rule says you must have only one DI tool. Many DI architectures have room for specialized tools that assist with scalability and speed. More and deeper tool Dig deeper into the DI tool you already have. Modern tools are amazingly feature-rich, and survey data use is inevitable for shows that organizations are using only about 40% of tool functionality. upcoming generations. Use a tool. Hand coding is feature-poor and non-productive by nature, and there’s no way you can hand code most leading-edge requirements for the next generation, such as event processing, text analytics, and advanced DQ functions (e.g., identity resolution). Anticipate integrating new data types. Complex data (as in hierarchies and XML) and text (human language) are the most likely new data types for the average DI implementation. Know the new stuff, but Look into the newest DI functions—whether you need them or not. Stay educated so you can map available don’t forget the old. DI options to new requirements as they arrive. Be open to new platform choices. It’s just a matter of time before DI tools are commonly running on private or public clouds and being licensed as open source or software-as-a-service. Keep an eye on the DI techniques poised for greatest growth. See the right-most column in Figure 13. You may not need all of these now, but you will someday. Don’t forget the meat and potatoes. ETL has lost its sex appeal for some people, but it’s still the heart and soul of most DI solutions. Likewise, protect and grow the DI disciplines that have the strongest demand from your user base, such as data quality, metadata management, CDC, data sync, and MDM. All future generations will be a mix of old and new, legacy and leading edge.

30 TDWI rese arch Recommendations

Consider DI as infrastructure. If your organization truly needs to share lots of data broadly across business units, making DI a centrally owned resource that’s openly shared is more likely to achieve enterprise goals than a plague of departmentally owned DI solutions. Expect DI to keep evolving. It’s just now exploring new frontiers such as extended collaboration and coordination, complex data, clouds, open source, services, and DI as infrastructure. Assume there is a new generation of DI in your future. Either business changes will force you into one or The rampant changes in your current generation will age to the point that you need to bring it up to date. Most DI solutions are DI aren’t over. Revel in out of date or feature poor in some respect, anyway. Leverage one generation after the next to fix the what’s to come! failings of prior ones or reposition for tomorrow’s computing needs.

tdwi.org 31 Research Sponsors

DataFlux SAP www.dataflux.com www.sap.com DataFlux is a software and services company that enables business agility and As market leader in enterprise application software, SAP (NYSE: SAP) helps IT efficiency. A wholly owned subsidiary of SAS (sas.com), DataFlux provides data companies of all sizes and industries run better. From back office to boardroom, management technology that helps organizations reduce costs, optimize revenue, warehouse to storefront, desktop to mobile device—SAP empowers people and and mitigate risks as well as manage critical aspects of data. organizations to work together more efficiently and use business insight more effectively to stay ahead of the competition. SAP applications and services By providing solutions that meet the needs of business and IT users, DataFlux offers enable more than 109,000 customers to operate profitably, adapt continuously, complete enterprise solutions, including enterprise data quality, data integration, and grow sustainably. data migration, data consolidation, master data management (MDM), and data governance. It also provides a full range of training and consulting services.

IBM Syncsort www.ibm.com/software/data/integration www.syncsort.com IBM InfoSphere Information Server is a data integration platform that helps Syncsort is a global software company that helps the world’s most successful enterprises understand, cleanse, transform, and deliver trusted information to organizations rethink the economics of data. Syncsort provides extreme data critical business initiatives. The platform provides everything needed to integrate performance and rapid time to value through easy-to-use data integration and data heterogeneous information from across disparate systems, including capabilities protection solutions. With over 12,000 deployments, Syncsort has transformed to support information governance, data quality, data transformation, and decision making and delivered more profitable results to thousands of customers data synchronization so that information is consistently defined, accurately worldwide. represented, reliably transformed, and updated on an ongoing basis. Business and IT professionals use these capabilities to design, deploy, and monitor the core business rules, data integration, and data quality processes they need to deliver effective business analytics and to optimize their information architecture.

Informatica Talend www.informatica.com www.talend.com Informatica is the world’s number one independent leader in data integration Talend is the recognized market leader in open source data management and software. With Informatica, thousands of organizations around the world gain application integration. Talend revolutionized the world of data integration when it a competitive advantage in today’s global information economy with timely, released the first version of Talend Open Studio in 2006. relevant, and trustworthy data for their top business imperatives. With Informatica, enterprises gain a competitive advantage from all their information assets to grow Talend’s data management solution portfolio now includes operational data revenues, increase profitability, further regulatory compliance, and foster customer integration, ETL, data quality, and master data management. Through the loyalty. The Informatica Platform provides corporations with a comprehensive, acquisition of Sopera in 2010, Talend also became a key player in application unified, open, and economical approach to lower IT costs and gain competitive integration. advantage from their information assets held in the traditional enterprise and in the Unlike proprietary, closed solutions, which can only be afforded by the largest Internet cloud. and wealthiest organizations, Talend makes middleware solutions available to organizations of all sizes, for all integration needs. TDWI Rese a rch

TDWI Research provides research and advice for business intelligence and data warehousing professionals worldwide. TDWI Research focuses exclusively on BI/DW issues and teams up with industry thought leaders and practitioners to deliver both broad and deep understanding of the business and technical challenges surrounding the deployment and use of business intelligence and data warehousing solutions. TDWI Research offers in-depth research reports, commentary, and inquiry services as well as custom research, topical conferences, and strategic planning services to user and vendor organizations.

1201 Monster Road SW T 425.277.9126 Suite 250 F 425.687.2842 Renton, WA 98057-2996 E [email protected] tdwi.org