<<

How to Establish an Efficient ETL Process in a Fintech Product How to Establish an Efficient ETL Process in a Fintech Product 2

How to Why your business needs an ETL process

4 What is ETL processing? 8 Types of processing Establish 9 Tools for data processing 10 Data types and formats 11 ETL in Fintech an Efficient 13 The bottom line ETL Process How to make the ETL process efficient?

15 Testing Framework for ETL processing in a Fintech 18 ETL Security 19 The result

Product Checklist How to Establish an Efficient ETL Process in a Fintech Product 3

Data is the blood of the modern world. Fintech is a big and convoluted Everything we create and share within ecosystem that is interconnected by information technology space is data. data streams and integrations. Imagine I know this seems obvious, but the the industry as a living being — if it question is, do we understand how to has problems with its blood, it will be work with data properly? sick and weak. Efficient data processing is all about the health of specific companies and the industry on the whole. Let me show you the problem at scale. How to Establish an Efficient ETL Process in a Fintech Product 4

When it comes to integrating data, there are mainly three steps involved: extract, transform, and Why load. Each has pitfalls to avoid. However, even if they seem obvious, companies spend tons of dollars to solve these problems, trying to find the most cost-efficient solutions.

your Here are three reasons: 1 2 3 business The data sources can contain When you already have files from The primary question is how the data mistakes. If you don’t own the data data sources, you’re still not able to is implemented into the target base source, you can’t be sure it contains insert it into your target base and system once it is transformed. accurate data. The same is true when because the data format differs from An uncontrolled insert can produce needs you download big data files from a one source to another. The errors and require deep testing and source database — there’s no transformation of data structures is traceability. guarantee that the download will be required, which should be done in successful and you won’t lose any accordance with business rules. an ETL part of the data. The process is also However, modern businesses are very resource-demanding because versatile and need to change quickly, data source files are huge and so a clear vision of business process constantly updating. processes and rules is very difficult to obtain and, as a result, transform data structures efficiently. How to Establish an Efficient ETL Process in a Fintech Product 5

Businesses, meanwhile, demand from systems unprecedented speed of and utilization.

Quick deployment and testing help them check assumptions and choose the most appropriate way to develop the product.

It’s a complex problem to keep data processing solid and quick. How to Establish an Efficient ETL Process in a Fintech Product 6

What is ETL processing?

ETL is a type of process referring to three distinct but interrelated Extract Transform Load steps (extract, transform, and load) and is used to synthesize data from multiple sources many times to build a data Data Source 1 warehouse or data lake.

Data Source 2 How to Establish an Efficient ETL Process in a Fintech Product Although the ELT approach is a good choice for many products — and your 7 project may also benefit from it the most — this guide is focused more on ETL.

ELT Unlike ETL, ELT is an advanced version of ETL, which is a process in which transformation is done on the intermediate Extract & Transform server before loading data into the target Load Analytics server. But ELT allows raw data to be loaded

PureData directly into the target server and transform All Data System for Analytics from there. Presumably Important Data In the ELT approach, a tool is used to obtain data from a source or multiple ETL sources, and the extracted data is stored in a staging server or database. check and business rules are applied to this Transform & Extract staging area, and then it loads to the data Load Analytics warehouse. All the data transformations are Staging Area done in the after the data is Presumably Presumably DWH Presumably loaded. Both processing methods are used to Important Important Important Data Data Data build Fintech applications. How to Establish an Efficient ETL Process in a Fintech Product But what are the features that ETL can be beneficial to implement? 8 Let’s start overviewing data types to see which ETL works for the best.

Types of data processing

Financial technology systems work with the following data types most often:

Real-time Historical data

Trade processing based on events. The data that represents the changes For example, a financial advisor in stock prices over some time, for generates trades, a request is made example. Often used in analytics on custodian servers, and the tools or for building charts. custodian returns that the trade be executed. Real-time data processing End-of-day requires special techniques and technologies to use. Huge files with large amounts of data are processed once per day when the system is idle (e.g., late at night). For example, it can be all the trades made through a custodian per day. How to Establish an Efficient ETL Process in a Fintech Product 9

Tools for data processing

If you’re going to implement the ETL process, you may definitely need to deal with one (or several) tools from the following list. Hadoop. The Apache Hadoop Redash. Redash (https://redash.io/) Spark. Spark (https://spark.apache.org/) (https://hadoop.apache.org/) software allows us to connect and query data is a unified analytics engine for library is a framework that allows for sources, build dashboards to large-scale data processing. It provides the distributed processing of large visualize data, and share them within high-level APIs in Scala, Java, Python, data sets across clusters of the company. It allows implementing and R and an optimized engine that computers using simple dashboards that make you always supports general computation graphs programming models. It is designed see the big, easy-to-digest picture for for . It also supports a rich to scale up from single servers to a deeper understanding of your data set of higher-level tools, including Spark thousands of machines, each offering processes and better SQL for SQL and DataFrames, MLlib for local computation and storage. decision-making. machine learning, GraphX for graph Rather than relying on hardware to processing, and Structured Streaming deliver high availability, the library for stream processing. Apache Spark is itself is designed to detect and a popular solution for those who handle failures at the application implement machine learning and big layer, so it delivers a highly available data analytics into their product service on top of a cluster of because it’s a lightning-fast unified computers, each of which may be analytics engine. prone to failures. How to Establish an Efficient ETL Process in a Fintech Product 10

Data types and formats

Before the data can be stored and Meanwhile, there is another way to used in a target database, they can integrate the data from one system be formatted in various ways. If we to another. An application divide them into groups, they would programming interface (API) is a include databases, flat files, web computing interface that defines services, and other sources such as interactions between multiple RSS feeds. software intermediaries. It defines the kinds of calls or requests that can These data formats are used as a be made, how to make them, the source: ASCII, CSI, CSV, CSIM, data formats that should be used, MetaStock, Excel, JSON, SFTP batch the conventions to follow, and so on. files, XML, EzyCharts, Quicken, and so on. How to Establish an Efficient ETL Process in a Fintech Product 11

ETL in Fintech

Retail Investment

Every project that envisages the use Institutional Investment of data stored in another system should have an ETL process. End-to-end integrations are the Payments mainstay of Fintechs, which makes having an efficient ETL process in such companies a substantial Personal Finance advantage.

Industries that benefit from the ETL Lending process include investment, payments, personal finance, equity financing, consumer banking, and international money transfer insurance.

Consumer Banking

Equity Financing How to Establish an Efficient ETL Process in a Fintech Product 12

Portfolio Management Integrations Champions: Breakdown by Category*

Looking at the breakdown by category, custodians, financial planning tools and CRM are the leaders in volume. The biggest number of integrations with custodians goes to CircleBlack. However, FinFolio approximates to CircleBlack with 19 integrations with custodians, leaving other leaders at distance.

The next chart shows the Custodian

types of companies that Financial Planning

integrate the most within the CRM

wealth management industry. Trading & Rebalancing

We took as an example the Investment Data & Analytics

portfolio management Client/Business Relationship Mgmt

category as one of the most Risk Tolerance

data integration-rich Fintech Document\Content Mgmt segments. Portfolio Mgmt Data Aggregation

Portfolio Analytics

Custodial Platforms

Advisor Tools

Back Office

Compliance

Research and Analytics

Other

See more insights in our Financial Services

Integrations Report: Portfolio Marketing

Management Estate Planning 0 25 50 75 100 How to Establish an Efficient ETL Process in a Fintech Product 13

Summing up, the process of data How can one combine all the benefits of an efficient collection and transformation is easy ETL process under the umbrella? The if you know what features ETL suits best and understand what problems The next section unveils this challenge and illustrates you may encounter along your way. the solutions in a testing framework creation case. Businesses try and beat the market bottom by implementing efficient ETL processes to save effort for their development teams. They also need line to get access to the newest data before their competitors do. How to Establish an Efficient ETL Process in a Fintech Product 14

How to Divide source data into 1 packages and process the make packages in parallel. the ETL Use test automation and There are three stick to a systemic basic steps to make approach while testing the process your ETL process 2 transformation-load results, better: and implement efficient? checks upon loading data.

Trace the changes in the target Here is the case on building a testing framework to 3 database and have the ability to illustrate how to apply those three principles into practice. revert the latest changes. How to Establish an Efficient ETL Process in a Fintech Product 15

Testing Framework for ETL processing

The problem. A portfolio management platform integrates data flows from all the biggest custodians What could solve the problem? such as Schwab, Pershing, and • Validate that data goes into a Fidelity. The files with firm data from database in the proper format.

these sources should be processed and formatted as the platform • Validate that data goes according The solution: A test automation requires and then gradually loaded to to business rules. framework that can validate the platform’s database.

• Validate that data goes according different custodian files with our Each custodian provides files in its processing rules. own format and time, which adds to business rules

complexity to the processing. Solving • Map automated test cases to integration problems for the data decision diagrams for incoming data. team wastes time for communicating and fixing issues such as unexpected • Understand how processing works data format change, overdue data, for different custodians when there and so on. Also, some of the are changes to the system. custodians have specific extra data and logic that should also be collected. How to Establish an Efficient ETL Process in a Fintech Product 16

To solve the issues described above, one needs to build a testing Framework structure framework so that tests to be mapped to test cases in a test management

system and decision-making diagrams Test Specific Files Run a test that will compare for data extractions as well as a test specific columns in db with expected result case, expected result, and preconditions for common tests are custodian 1 the same across different custodians, whereas those that differ should be in a custodian specific file. Database custodian 2 Processing job Process & load data

custodian 3

Prepare database state How to Establish an Efficient ETL Process in a Fintech Product 17

The framework includes the following processes to ensure smooth integration testing:

As for the testing framework, we used Java language and TestNg as a test runner. We also implemented Run a test integration between the test runner and the test comparing specific management system (in our case, TestRail) so we can Prepare Process and database columns push test results there and have a clear, structured database state load data with the expected coverage for the automated tests. For DB operations, result we used the DbUnit library.

Overall, the toolset can be easily replaced with any other language specific tool set. How to Establish an Efficient ETL Process in a Fintech Product 18

ETL Security

Another great part of making the ETL Custodians report errors of three different process better is ensuring its security. severity levels: As part of the framework, parsers and mappings trace specific objects and Critical. This mistake fails the data allow us to catch mistakes in a timely processing and thus should be fixed ASAP.

manner. Major. This type of error means a problem with a specific record that does not affect any other records.

Minor. This mistake should be fixed in the order of arrival.

In addition, the processes of data transformation are monitored via a table that displays whether a specific process failed or succeeded. Redash panels are used to gather information about certain firms, such as, the number of opened accounts today and yesterday; such information prevents or duplication. Finally, automated tests cover all the business logic of the integration to ensure data cohesiveness and security. How to Establish an Efficient ETL Process in a Fintech Product 19

The result What you get when your ETL is optimized

By adding a testing framework into the Speed development process, project scaling Predictable completion time became possible. Before that, it was difficult to maintain the test cases’ Safety relevance and trace the changes. Each Scaling new integration made the situation even worse. Now, the big part of processes is Monitoring automated and controlled, which enables Visualization the QA team from adding more people, making manual work. Flexibility

We know that building a testing framework for ETL from scratch can be ETL processing will bring the biggest value to products challenging. Unfortunately, there is no step-by-step formula on how to build it in that have or are going to implement high data volumes, your specific case. However, you can get aggregate data from numerous sources, or need the a consultation with our professionals who have rich experience in this field. data to be transformed into numerous formats or They will answer all your questions and structures. recommend the next steps. How to Establish an Efficient ETL Process in a Fintech Product 20

How we can help

• General consultation. We explain what is ETL and how it impacts the business task. • Simplified Tech Audit. We overview your current system and Data types to ETL allows infrastructure, , code, etc. • Simplified solution proposal. Gathering information about the to process: system and your business need, we form a solution proposal. You can share this proposal with the other company for a tech Historical data audit or accept the proposal's terms and proceed to the staffing.

Clients data (accounts, portfolios) • Candidates staffing. If there are no available engineers to help with your business goals in the company, we would staff the System data (logs, monitoring) best specialists to fit your needs. Billing data • Technological solution development.

Geolocation data Dedicated team. To establish your efficient ETL process, we can Media data (audio streams, video) compose and launch a dedicated engineering team to maintain and enhance other elements of your system.

Request a consultation How to Establish an Efficient ETL Process in a Fintech Product 21 Checklist

ETL is a process of data transformation used to synthesize data from multiple sources many times to build a unified format database.

Businesses require the ETL process to be efficient, accurate, secure, and quick. What are the main Industries that benefit the ETL process include investment, points to get from payments, personal finance, equity financing, consumer banking, this guide? and insurance. Building a testing framework for ETL processing allows us to maintain control over the data transformation process and avoid .

Only a systemic and automated process can be scaled efficiently; keep that in mind when projecting your roadmap. get in touch

[email protected] www.insart.com +1 917 475 0008