Drive Business Agility with Integration Leverage Integration for Business Innovation Summary

Most organizations of medium to large size use a wide array of applications, each with its own and data stores. Whether these applications are based on-premise or in the cloud, data sharing between these applications is critical for their usefulness. Developing processes for consolidating data from multiple applications and creating a unified of data assets is known as , and this is done using different techniques.

This eBook delves into the different approaches of integration and guides the users regarding the selection of the right tools and techniques. Furthermore, it introduces Astera Centerprise and its range of features that facilitate the data integration process in a code-free environment to speed up the data-to-insight journey.

Drive Business Agility with Data Integration Table of Contents

GETTING STARTED ...... 04 What is Data Integration? 05

THE IMPORTANCE OF DATA INTEGRATION ...... 06 Data Integration Use Cases 07 How Do Businesses Benefit from Integration Tools? 08

How to Optimize Business Capabilities with Data Integration 09

DATA INTEGRATION APPROACHES ...... 11 Common Integration Strategies 12

DATA INTEGRATION TOOLS ...... 14 Classifying Integration Tools 15 How to Evaluate Data Integration Tools? 15 Astera Centerprise: A Code-Free Data Integration Tool 17

ASTERA CENTERPRISE: CASE STUDY ...... 25 VinSolutions – An Automative Company 26

CONCLUSION ...... 28 Getting Started

Drive Business Agility with Data Integration | 04 What is Data Integration?

Data integration is the process of combining, cleaning, and presenting data in a unified format. This includes consolidating data from a wide variety of source systems with disparate formats, removing duplicates, cleaning data based on business rules, and transforming it into the required format. Integration is used in various processes like , application integration, and . Code-free tools help business users access their reserves of enterprise data in near real-time, and comb through their data repositories to derive faster.

Drive Business Agility with Data Integration | 05 The Importance of Data Integration

Drive Business Agility with Data Integration | 06 Data Integration Use Cases

The use cases for data integration are broad, and vary depending on the business needs, and data volume and complexity. For instance,

• A health center may need an integration software to consolidate and manage its multi-source real-time data related to patients and employees,

• An online car buying-and-selling business may need it to update millions of records daily, and cut down customer onboarding time from months to hours by mapping the client data to the company , and

• An office of investments may need it to map the institution’s endowment data from disparate source systems (including both internal systems and external money managers) into a tracking software program for risk analysis.

For each use case, a process can be constructed to automate and streamline manual tasks. While the specific needs may vary, at its core, data integration covers the processes of merging, cleansing and mapping data from source(s) to destination(s), all of which can be done using different approaches.

Drive Business Agility with Data Integration | 07 How Do Businesses Benefit from Integration Tools? With the massive influx of information coming in from multiple source systems, businesses need to proactively handle the five Vs of data—value, variety, velocity, veracity, and volume. With a robust integration tool, an enterprise can extract the most value from data, standardize the variety of information, deal with the data velocity on time, improve the veracity, and easily process volumes of data. Here are some of the ways how these tools help businesses. Faster Time-To-Value

Businesses use accessible tools to create a single source of truth for their data and speed up their internal processes, reaching valuable insights faster. For instance, Randolph-Brooks Federal Credit Union wanted to migrate their legacy data, clean it, and convert it into various formats. What would have taken them a week, only took them half a day with an integration tool. Smarter, Informed Business Decisions A powerful integration solution allows businesses to better manage, measure, monetize, and make targeted decisions based on quality data. It allows business users to directly access data they need without having to constantly request it from IT, get a complete view of their customer behavior, and use strategic insights from their clean data to gain an edge over the competition. Maintain Quality Data and Improve Revenues

Data quality correlates directly to the positive or negative impact on business decisions. When data is up-to-date, clean and insightful, businesses can improve their revenues by up to 66%. With accurate and trustworthy data at disposal, business decisions are better sculpted to meet their goals without being hindered by bad-quality data.

Drive Business Agility with Data Integration | 08 How to Optimize Business Capabilities with Data Integration

If done right, integration enables businesses to be agile by simplifying data exchange with partners, customers, and vendors/suppliers. This helps enterprises make smarter decisions and drive stronger partner relationships. Following are some of the ways in which data integration can help enhance your business capabilities: By Extracting Data From Structured and Unstructured Sources Incoming data can be structured, semi-structured, poly-structured or unstructured. For instance, text-based PDF files, PDF forms and scanned PDF images are used as a medium for exchanging information by many organizations. But the data contained in PDF files is unstructured, which can complicate . An integration tool can automate the extraction process and integrate the data with internal systems for further processing and analysis. By Integrating Data From Hierarchical Files

Integrating data from flat files is comparatively easier but business users face challenges when they try to extract, parse, and integrate information from hierarchical data files, such as XML, JSON, EDI, and COBOL. To perform hierarchical data integra- tion, business users rely on IT, which increases the burden on them. A data integration tool can effectively bridge this gap between business executives and IT. By Making Data Readily Available To Business Users

A data integration tool with a user-friendly interface and a comprehensive library of built-in functions can help limit the reliance on IT. It readily makes the data available to business users who can then work with the available information and get business insights without delay. Additionally, data integration tools can automate the ETL process, which eliminates the need for manual integration and significantly reduces the chances of errors.

The performance of a business is optimized when the executives are more focused on making critical business decisions rather than collecting and integrating the data. By Checking for Data-Quality

A data integration tool cleanses, validates, and ensures the trustworthiness of the incoming data. Poor quality data can adversely affect business insights that can prove to be expensive for the business.

Drive Business Agility with Data Integration | 09 Overall, a data integration tool that simplifies the ETL process for the users is an investment that organizations should make to stay relevant in the current data-driven business environment.

It can prove to be beneficial for businesses in more than one ways. By bridging the gap between IT and business executives, it helps in efficient division of workload. It empowers business users to drive insights from the data by giving them prompt access to it.

Simultaneously, when executives delegate the task of data integration and extraction to software, they can focus on more critical aspects of the business. The result is faster and more accurate business decisions, with minimized costs and increased revenue.

Drive Business Agility with Data Integration | 10 Data Integration Approaches

Drive Business Agility with Data Integration | 11 Common Integration Strategies

Data integration techniques have evolved over the years from manual to automated. Depending on the varying business needs, the process can be implemented using any of these approaches. Manual This approach involves a user manually collecting data from disparate source systems, applying quality rules to clean it, and uploading it to the target system(s). It also involves hand coding for every new use case to ease the mapping of datasets. In a middleware solution, a virtual “pipeline” is created between multiple systems that allows bi-directional communication. This connectivity streamlines integration tasks. /Physical Data Integration This technique includes moving data from the source system to a data warehouse or some other destination like a . Businesses prefer this process due to the ease and flexibility in storing, viewing and managing all their data in a centralized location.

There are two approaches to this method: ETL (extract, transform, load) and ELT (extract, load, transform). Both techniques employ three individual processes of extracting, transforming, and loading data onto a destination. However, the main difference is where the staging area resides for the process.

ETL (Extract, Transform, Load)

Extract Transform & Load

ETL Infrastructure Database Destination

Source System(s)

Figure 1: Illustrating the ETL process

Drive Business Agility with Data Integration | 12 In this approach, data is extracted, the transformation logic is applied, and the resulting data is loaded onto the target database or data lake destination. Due to the extensive availability of frameworks and tools that support ETL, this approach is great for businesses that need to integrate and process large volumes of data, though the processing time is higher for larger volumes.

ELT (Extract, Load, Transform)

Transformation

Extract & Load

Database Destination

Database Source(s)

Figure 2: Illustrating the ELT process

In this technique, the extracted data is first loaded onto the target destination, and the transformation logic is applied within the database or data warehouse. Because the ETL infrastructure is removed from the equation and the transformation occurs directly within the database, the total power consumed by the system and the data latency is significantly reduced.

Data virtualization takes a completely different approach from physically moving data to and from databases. In this process, data is not moved across the systems—instead, an abstraction layer provides a unified view of the disparate systems, leaving the data exactly where it is physically. Data analysts can then request information through the virtual layer, which contains the to access the sources.

This process allows businesses to get real-time access to their data without exposing the technical details of the source systems, and quickly make enterprise-wide changes on the virtual layer instead of first consolidating the data in one place or implement- ing changes at each source separately.

This integration approach does not support bulk data movement, although it can run alongside ETL or ELT.

Drive Business Agility with Data Integration | 13 Data Integration Tools

Drive Business Agility with Data Integration | 14 Classifying Integration Tools There are two type of data integration tools available:

On-Premise Cloud-based

On-premise integration tools are launched locally, Cloud-based integration tools are hosted on a third using an enterprise server, and are typically used party server – in the cloud. In most cases, these by businesses that process legacy and/or higher solutions are web-based. Businesses with a simple volumes of data. These solutions are used by use case, where the data is routed to a workflow businesses that require full control over the tool and the transformed data is loaded to the preferred and have data architects to set up workflows as destination(s), are likely to choose close-based data the need arises. integration tools.

How to Evaluate Data Integration Tools? When evaluating enterprise-grade data integration tools, it is imperative to ensure that the solution offers a host of features that will make your data journey easier. Here are some features, based on common use cases, that you should look for in a data integration solution: Bi- And Multi-Directional Data Synchronization In many use cases, data not only needs to be transferred in one destination, it also needs to be updated in other enterprise systems to maintain consistency and ensure the authenticity of the data throughout the business network. An integration tool should be able to offer accurate and timely synchronization between the connected systems. An integration tool should be able to offer accurate and timely synchronization between the connected systems.

Drive Business Agility with Data Integration | 15 Workflow Automation

Data integration is generally not a one-time job. The incoming data sets need to be cleaned, transformed, synced, and made available to the intended users multiple times. Trigger-based workflows allow data scientists to automate repetitive tasks and simplify the integration process. Users can easily schedule a workflow to run it at a specific time or trigger it once a specific event criterion is met. Fast Data Processing Businesses can assign more time and resources on enterprise scaling and other revenue-based decisions once they decrease the usual time it takes for integration tasks by replacing it with faster solutions. A robust integration tool should be able to process volumes of data quickly and efficiently, without consuming too much time for any part of the process. Features like pushdown optimization are essential in this regard.

In industries where processing and analyzing volumes of data is critical and has a direct impact on clients, such as finance and healthcare, this feature can simplify integration tasks and ensure that the data latency is minimized to a manageable level. Support for Multiple Source Systems and Formats

Enterprises work with multiple data formats and sources, including legacy and modern formats and structured, unstructured and semi-structured sources. An integration tool should offer support for all of the above-mentioned to simplify data access. , Cleansing, and Profiling

Missing fields, duplicates, and invalid data are major data quality issues that hamper the effect of otherwise smart business strategies, and instead, result in negative customer experiences and missed opportunities.

• Data quality is a component of the integration process that identifies and weeds out the bad data based on custom business rules.

removes inconsistencies, duplicates, and errors from the source data.

• Once the data is clean and updated, business analysts need data profiling to extract valuable statistics, insights, and summaries from the database which they can utilize in informed business decisions.

All these features are must-haves in an integration tool. They ensure that business analysts have the most updated information to derive insights and shape strategies. Ensure that business analysts have the most updated information to derive insights and shape strategies.

Drive Business Agility with Data Integration | 16 Real-Time Data Preview When creating complex integration flows, it is important to be able to preview the input or output data at any node in the flow before execution. Data previews allow for better flexibility and visibility into the mappings and enable users to check for issues at various instances and correct them before running the entire flow. Astera Centerprise: A Code- Free Data Integration Tool Astera Centerprise is an industry-grade, high-performance solution that helps businesses make the most of their existing and incoming data with easy mappings, transformations, pre-built connectors, and more. With the ability to process volumes of data with its powerful parallel-processing ETL engine and supporting a wide range of source systems and formats, the tool eases the way to enterprise integrations.

Astera Centerprise’s wide range of key features include: Drag-and-Drop, Code-Free Mapping Environment

Astera Centerprise features a visual, drag-and-drop interface that provides advance-level functionality for development, debugging, and testing in a code-free environment.

The data integration platform offers the same level of usability to both developers and businesses a range of user-friendly features.

Drive Business Agility with Data Integration | 17 Figure 3: Demonstrating the code-free drag-and-drop interface of Astera Centerprise

Workflow Automation and Job Scheduling

With a built-in job scheduler, Astera Centerprise allows you to schedule anything from a simple data transformation job to a complex workflow comprising of several subflows.

Leveraging process orchestration capabilities of the data integration software, you can sequence integration and transformation jobs, which can be executed serially or in parallel on multiple servers. Other built-in workflow features include SQL execution, outside program execution, FTP uploads and downloads, and email.

Drive Business Agility with Data Integration | 18 Figure 4: Illustrating Centerprise's workflow automation and job scheduling capabilities

Industrial-Strength, Parallel Processing Engine

Featuring a cluster-based architecture and a parallel processing ETL engine, Astera Centerprise allows data transforma- tion jobs to be run in parallel. This way, the whole dataflow is processed in parallel on multiple nodes. As a result, you experience unparalleled performance even when processing large datasets.

Drive Business Agility with Data Integration | 19 A Vast Selection of Connectors Astera Centerprise features a vast collection of built-in connectors for both modern and traditional data sources. From simple CSV, Excel, or fixed length files, relational databases, hierarchical EDI and XML files, legacy formats to enterprise applications, cloud solutions, and REST APIs, it’s all there, ready to be used.

Business users and developers can connect data from heterogeneous sources, including data warehouse, cloud applications, and more in a drag-and-drop GUI (Graphical User Interface).

Instant Data Preview With Instant Data Preview, Astera Centerprise provides you an insight into the validity of the data mappings you have created. Using the feature, you can inspect a sample of the data being processed at each step of the transformation process. This, in turn, allows you to promptly identify and fix any mapping errors before the job is executed.

Drive Business Agility with Data Integration | 20 Figure 5: Accessing the Instant Data Preview feature

Extensive Library of Pre-Built Transformations

Astera Centerprise dramatically simplifies the process of transforming complex hierarchical data with its visual, drag-and-drop environment and broad selection of built-in transformations. These transformations can be strung together to create a complete dataflow and automated using the built-in job scheduling and automation features. Data validation is an essential requirement for various data processes, where the end goal is to help ensure the accuracy of the results.Using the built-in data profiling features of Astera Centerprise, you can easily examine your source data and get detailed information about its structure, quality, and integrity. Custom data quality rules can also be defined to validate incoming data and identify missing or invalid records. Furthermore, data cleansing helps with error rectification and deduplication to authenticate the validity and relevance of the data.

Drive Business Agility with Data Integration | 21 Figure 6: Applying Data Quality Rules in a dataflow

Pushdown Optimization for Maximum Performance

With Astera Centerprise, a data transformation job can be pushed down into a relational database, where appropriate, to make optimal use of database resources and improve performance. This helps businesses better manage processing needs, save more time, and boost developer productivity.

Figure 7: Demonstrating the pushdown optimization mode

Drive Business Agility with Data Integration | 22 Astera Centerprise currently supports pushdown optimization for MSSQL, PostgreSQL, Oracle, Db2, and MySQL database systems. All queries are generated automatically based on the existing transformation logic that is converted into native SQL code according to the target database platform in use.

These queries are executed in parallel with the loading process, which means that throughput rates can be maximized, while wait times between data transformation and delivery to the end-user are improved considerably. With ELT, data size becomes a less pressing concern. While larger datasets will produce longer running queries, query processing times are still improved considerably.

This integration process is ideally suited to modern reporting systems that supply increasing volumes of semi-structured and unstructured data to the enterprise. REST Server Architecture

REST APIs work in a client-server environment where a client sends out HTTPS requests to the server, the server retrieves the requested information, and returns a response back to the client.

REST-based Centerprise server can leverage the REST API technology by providing a leaner and lighter Centerprise client application and allowing Astera REST web service APIs to perform a range of operations on enterprise and Access Control

The solution also includes authorization and authentication features to secure any action performed by authenticated users against the run-time and design-time components of the solution. The security is built around three key areas:

• User authentication via bearer-token authentication • Secure domain communication between the client and server over TCP/IP and HTTP protocols • Role-based access control via an intuitive user management and access control dashboard.

These protocols help administrators prevent unauthorized access to enterprise data assets, and ensure access policies for both internal and remote users.

Drive Business Agility with Data Integration | 23 Job Optimizer

Performance is one of the key concerns of data professionals managing integration jobs. Job Optimizer is another significant product feature that is developed to optimize performance and reduce job execution time. By modifying the flow at run-time, such as removing unnecessary sort operations on pre-sorted data or adding ORDER BY clause on database sources to run sort operations in the database, the platform improves dataflow performance.

Figure 8: Job monitoring window shows the suggestions for dataflow optimization in a different color

Additionally, users can optimize the flows by following the recommendations generated by the Job Optimizer in the job trace. The feature improves performance by efficiently managing the RAM, CPU, network, and disk utilization when executing jobs.

Figure 9: The features of Astera Centerprise summed up in an image

Drive Business Agility with Data Integration | 24 Astera Centerprise: A Case Study

Drive Business Agility with Data Integration | 2325 VinSolutions – An Automative Company VinSolutions is a company that has experienced the benefits of Astera Centerprise’s extensive range of features. The company is a subsidiary of AutoTrader and provides customer relationship management (CRM) functionality to auto dealers. One important function they offer is the ability for dealers to transfer inventory information from their dealer management system (DMS) onto AutoTrader.com and KBB.com – two car information and sales sites owned by their parent company. The Challenge VinSolutions’ data integration started out with a homegrown system, but as the company grew, it quickly became obvious that the system was not able to scale with the growth in dealer partners and in data. The company onboards 70-100 dealers every month and imports data on 1.2 million vehicles per day.

Because each dealer has a DMS with its own configuration, different data mappings are required for each dealer, creating a very complex data integration challenge.

This is what VinSolutions discovered—they needed a solution that could not only scale to the complexity of their data integration requirements, they needed a solution with a user-friendly GUI that would enable non-IT users to shoulder much of the day-to-day workload without having to write code.

Before After

Onboarding time: Onboarding time: Two months One Hour

368 02 Developer hours Developer hours to onboard to onboard 1 vendor 1 vendor

Figure 10: A comparison of the usage of resources by Vin Solutions before and after using Astera Centerprise

Drive Business Agility with Data Integration | 26 The Solution VinSolutions decided to opt for Astera Centerprise due to its intuitive, drag-and-drop environment. The solution empowered their business personnel to quickly and easily parse and construct hierarchical structures and manage complex integration jobs faster and more affordably.

Astera Centerprise is an approachable data integration solution that significantly lowers the need for IT resources and empowers those who use the data to have a say in how it is processed.

According to VinSolution’s director of integration Doug Maskill, the software has dramatically reduced the need to custom code the integration into each dealer’s DMS: “Every time we brought in a new dealer, there was a lot of development effort that had to be taken into account, whereas with Astera Centerprise, it’s a lot faster process, and it doesn’t require a developer to be involved every step of the way.”

According to VinSolutions senior programmer Mike Ethetton, it could take up to 2 months for a developer to do the custom-coding to ensure the accurate capture of data from a new client’s DMS into the VinSolutions network. With Astera, that has been cut to about an hour’s worth of work.

With 70 to 100 car dealers going online with VinSolutions every month, the company has eliminated a lot of custom work on the part of its development team.

For example, VinSolutions is able to quickly and easily integrate data from the websites of Nissan, Kia, Lexus, Toyota, and other manufacturers, including inventory availability.

Using Astera Centerprise, any change made to car data at the DMS level will automatically trigger an update into the VinSolutions systems.

Drive Business Agility with Data Integration | 27 Conclusion

In today’s data-intensive environment, it is important than ever to be able to fully leverage information assets for accurate and timely decision-making. Data integration tools simplify this process, enabling organizations to extract value out of enterprise assets and shortening the data-to-insight journey.

Astera Centerprise offers all the features you need to kickstart your data integration project, consolidate disparate data sources, and create a unified view of your enterprise data.

To experience Astera Centerprise’s range of features that facilitate data integration, download the trial version.

Drive Business Agility with Data Integration | 28