WHITE PAPER

PETABYTE-SCALE DATA MANAGEMENT FOR CLOUD BANKING

Migrating banking data from legacy systems to the cloud is easier said than done, especially if data volumes are in petabyte scale. A change in people, processes and perception is required before banks can begin their journey to the cloud. Navigating these is crucial to a bank’s transformation and survival in this customer-centric age. Banks view Challenge of cloud cautiously cloud banking has fueled rapid automation, and analytics. According growth and adoption of innovative 1. Legacy technology to Gartner estimates, banks need to digital business models. From Uber to According to an Infosys Knowledge triple their digital business innovation Spotify and Netflix to Slack, thousands Institute survey, legacy systems budgets to modernize legacy of digital leaders have built their currently rank as the third most applications through 2020.4 businesses on cloud . This is commonly cited barrier to digital because it enables flexible, scalable, transformation (named by 42% Experts say that banks spend 80% of cost-efficient and continuously of respondents), but financial their IT budgets on legacy technology upgradable capabilities. But for banks, services executives expect that they maintenance, and a Tier 1 bank could adopting cloud technologies can be could become the most serious spend up to $300 million a year to a real challenge — not least because barrier in 2019.2 update existing software to meet mainframes continue to be at the core regulatory requirements.5 But cloud Millions are spent every year to of their business, given their reliability, can act as a workaround in reducing maintain these archaic systems,3 which almost impregnable nature and ability costs.6 For example, the Federal are often incompatible with modern to manage large volumes of complex Home Loan Bank of Chicago reduced technology. These systems act as a tasks. In fact, over 90% of the world’s infrastructure costs by 30% by moving barrier to better catering to the needs top banks still rely on mainframes.1 all of its internal production workloads of the digital customer. The reliance This is despite the fact that mainframes to the cloud.7 on legacy technology also stops banks are archaic in nature and present a from taking advantage of Agile and host of issues. DevOps ways of working, increased

External Document © 2021 Infosys Limited 2. Data management copies created in the data lake and 6. Agile and DevOps transported to different systems either With data growing by 2,500 ways of working in the cloud or internally. petabytes a day,8 data management is Legacy systems are expensive to increasingly becoming a concern for Banks’ manual and siloed approach maintain and delay a product’s time banks,9 not least because of regulatory of ingesting and migrating data to market.17 They are also not ideally and security requirements. Regulators is time-consuming. It is also suited to Agile programming methods, require banks to provide detailed counterproductive, as in this digital instead relying on waterfall methods reports, additional information on age, customers’ demand for instant, that can slow down the production stress testing and information at real-time, and personalized products of software and result in less timely a granular level.10 Banks continue and services has become the feature releases. to capture data in multiple forms new normal.13 Banks that have made the shift to (customer personal information, Agile and DevOps ways of working transaction history, journey maps, 4. Lower number of have benefited. In 2012, J.P. Morgan market data), and this huge volume of legacy coders followed a quarterly software-release data is now stored in data warehouses Legacy systems are built on outdated cycle, and coordination between and lakes. languages such as COBOL. For development and operations was Banks store data in various locations, instance, the U.S. financial sector minimal. These quarterly releases and it’s often accessed by different has over 200 billion lines of COBOL heightened risks, were cumbersome users, creating a siloed approach and code currently active, and COBOL and time-consuming, and increased multiple layers of data duplication.11 powers over 90% of ATMs.14 Despite delivery costs. The financial institution Singular data warehouses and COBOL’s widespread use, today’s decided to embrace Agile and DevOps lakes are formed, giving rise to coders prefer to use new languages practices. The software-releases further issues of storage, meeting that are compatible with artificial cycle changed from quarterly to 100 regulatory requirements, authenticity, intelligence, machine learning or cloud releases in 2015, 200 in 2016 and over incompatible formats and higher time computing. Few people want to learn 400 in 2017.18 to compute. These have an indirect a language that can talk only to legacy Capital One’s move from waterfall bearing on run-the-bank costs. technologies, and coders familiar to Agile software development The data collected by banks must with COBOL can be well into their 50s helped cut the time to build new 15 also be analyzed and deployed to or 60s, presenting significant skills application infrastructure by 99%. facilitate better decisions.12 However, shortage challenges in maintaining DevOps’ automation and continuous 16 due to legacy technology, banks are older technologies. Although integration of new code helped speed unable to leverage access to this data upgrades are available, they are still the bank’s development cycles, and for analytics and insights and improve not fit enough to compete or converge releases occurred with increased customer experiences. with systems of this digital era. frequency and higher reliability.19

3. Manual data migration 5. Security concerns 7. Cultural shift Currently, when a bank decides to Customer data in any form is sensitive Banks need to undergo a perception move data to a new system — whether for banks. The major cloud vendors, change. They must be ready to build it’s in the cloud or on-premises — a such as Google and Microsoft, have a culture that is cost‑conscious, new project team is set up and works outstanding security expertise, and customer-conscious and in a silo. Data is migrated from the data all are certified compliant with federal efficiency‑conscious.20 This is easier lake into the new system, or a new data governance standards. However, said than done, as many systems, connection is created between them. for years banks have delayed and tried processes and people have grown Because of the silos in the bank, other to avoid the issue of modernizing their with the bank. A Boston Consulting teams using the data may be unaware infrastructure. While they agree that Group assessment revealed that of this migration or new connection. cloud infrastructure offers them the among companies that underwent a ability to better cater to the needs of digital transformation, the number of In many instances, multiple teams the digital customer, they are reluctant want the same data and follow a profitable enterprises was five times to adopt it until they are convinced of higher among those that focused on a similar process without getting the safety and security of their data. connected to each other. This leads cultural shift compared with those that They do not have a clear strategy that did not.21 to duplication of data, with multiple will help them quickly adopt cloud.

External Document © 2021 Infosys Limited The bank generates and moves (AWS) and Microsoft Azure. How can banks benefit terabytes of data each day. Each Second, it focuses on the data process of data ingestion took eight from cloud? management issue. The platform to 12 weeks to deliver and was Moving data and applications to the enables automated ingestion of data cumbersome, time-consuming and cloud can save banks money. Some versus the manual and siloed approach counterproductive. say it could cut IT costs by as much as previously followed by banks. It also 75%.22 A large global bank that Infosys provides a platform and interface to partnered with to solve this problem Ushering in the Infosys ingest data centrally. estimated that it could save 50% open source data The platform can replicate data of costs overall by adopting cloud. from on-premises to a multi-cloud Some areas in the bank expect to save management platform environment, while supporting batch 90% of their costs in the transition. To solve this problem, Infosys and near-real-time movement. It was To achieve this, the bank is working built an open source, petabyte‑scale built with the purpose of enabling with Infosys to build a multi-cloud data multi‑cloud data ingestion and guaranteed data delivery at scale. management system that interfaces management platform, part of The data management platform with Google, Amazon and Microsoft Infosys Cobalt. It is a metadata- carries out various functions, removing clouds to migrate its data.23 driven data management ecosystem the need to use multiple types of Cloud is pivotal in architecting the that has been designed to meet an software for each function and saving bank of the future. Banks can benefit organization’s current and future data licensing costs. from the move to the cloud not only by delivery requirements. The platform The platform acquires, ingests reducing costs, but also by capitalizing allows businesses or functions within and transforms data in the on cloud’s computing capabilities, banks to move data from a source to following stages: its ability to scale IT solutions and a destination in a defined format at an its reliability. In fact, moving to the agreed frequency. 1. Acquire cloud can make banks more like the The platform was built to solve a host fintech upstarts with which they Banking transactional data is of banking issues, beginning with data increasingly compete. Founded in stored on multiple databases and management. It provides a central 2015, fintech company Monzo has an interchange formats built on legacy way to monitor all banking data and infrastructure that is capable of serving mainframe technology. Structured and enable it to be ingested in the cloud 1.7 million customers supported by unstructured data is stored in without duplication. The platform has only 10 people on its infrastructure Hadoop platforms, Oracle databases, also enabled the implementation of a and reliability team. The bank has 400 DB2, etc. Trusted Source Framework which helps core banking microservices on the with data lineage. This allows users to The platform interfaces with the cloud, which help it deliver value to track data usage, understand who has databases to acquire the stored data. customers in the form of offerings such made the last change, how the data as instant balance inquiry and real- has been tagged and better manage 2. Ingest time statements.24 the single view of the available data. Once the raw data is pulled in from on-premises, it needs to be stored Issues faced by an The architecture behind in a format that is durable and can Infosys banking client be easily accessed. The platform’s Infosys’ open source architecture ingests this data in The bank that Infosys partnered various phases: with to build the open source data data management management platform started off with platform • Data Extraction the classic challenges faced by many - Structured, unstructured or of its peers. Project teams worked The data management platform first semi-structured data is extracted in silos and used various tools and targets the ingestion problem — or replicated from various software. For moving data, Ab Initio taking data and moving it to cloud databases and source systems. was used; for tagging, Collibra; and or on-premises, ingesting it on cloud The data management platform’s for scheduling, Control-M. There were systems such as Hadoop, Google Cloud easy-to‑use interface supports nearly 20 moving parts, with licenses Platform (GCP), Amazon Web Services interaction with over 25 source attached to each.

External Document © 2021 Infosys Limited systems including relational build trust with users that the data - The platform’s job scheduler uses database management systems access path is secure. an event-driven architecture to (e.g., Oracle, Teradata, MS SQL schedule jobs. • Data Lineage Server, Hadoop, HDFS) and • Data Encryption multiple clouds (GCP, Azure, AWS). - Data lineage shows the life cycle Data is extracted or replicated of data, i.e., its origin, where it - Restricted and sensitive customer in near-real time using the Kafka has moved over time (within data, including PII, is ciphered at processing engine, while NiFi is on‑premises locations or cloud), multiple levels while at rest and in used to process batches. what actions have been performed transit to ensure no data breach on it and its final destination. It can occur. In fact, the Gramm- - Specific datasets can be extracted helps trace data back to its original Leach-Bliley Act (GLBA) in the U.S. with the user-friendly interface source (whether on-premises or also requires institutions to protect that enables developers and in the cloud), reconciles the data, customers’ Nonpublic Personal business users to create ingestion reduces duplication and traces Information (NPI). pipelines and track their errors back to their sources quickly. movement in real time. - The KMS complies with all banking - Another important facet is standards. It uses a 4096-bit RSA - The interface allows seamless “explainability.” As there are Key Vault, ensuring tamper-proof movement of data at petabyte multiple dependencies on protection and encryption of data. scale at an accelerated pace from specific data, data lineage helps This is higher than the industry different on-premises systems to banks explain why certain standard of 256-bit encryption. on-premises Hadoop data lakes or decisions were made. It also helps The platform’s KMS has separate to multiple cloud platforms (GCP, banks comply with regulatory modules for data masking and AWS, Azure). requirements of maintaining data encryption. and managing customer data, - Unlike monolithic applications - Data in transit to the cloud is e.g., the General Data Protection built as a single unit, the Infosys encrypted using Transport Layer Regulation (GDPR). platform’s microservices-driven Security (TLS) and authenticated architecture helps reduce - Infosys’ open source data using certificates to ensure development time. Its suite of management platform’s data communication between the services, each independently lineage is built using Java, Python client and the server is trusted. run and deployable, reduces the and D3.js library. Cloud certificates are stored using dependency on skilled developers a double encryption key. to deliver data movement. • Job Scheduling - This helps automatically trigger • Real-Time Streaming • Data Masking data transfers on a recurring - Infosys’ open source data - Data masking helps financial or an ad hoc basis — daily, management platform is built institutions protect restricted weekly, monthly or based on the to move data continuously and and sensitive customer data — occurrence of any event. in near-real time from source to including Personally Identifiable destination. To accelerate data - Banks need to process large Information (PII) — prevent movement from on-premises volumes of transactions quickly, unauthorized data access and to cloud, the platform uses without any errors or any avoid unwarranted data exposure. Infosys-developed open source downtime. Automating the job As a result, banking fraud components, NiFi, Kafka and cloud. is reduced. scheduling ensures that necessary data is transferred efficiently, to - Changes can often represent only - Data is masked before it is the right place at the right time. a small portion of the total data ingested into on-premises volume. The data management - If a server fails, the disaster locations or the cloud. The platform uses Infosys-developed recovery procedure is triggered platform’s Key Management open source components that and switches the job loads to the Service (KMS) masks data in real read logs, and then replicate and disaster recovery servers. time, and data can be unmasked mirror changes to the cloud. only for authorized users. - This helps banks comply with cybersecurity and regulatory requirements and helps them

External Document © 2021 Infosys Limited - Cloud provides a staging location uses Java Spring Boot, Active platform, a large global bank for data on its journey toward Directory, Cloud Identity & Access reduced its data migration cycle processing, storage and analysis. Management (IAM) and KMS for by over 75% and decreased its this functionality. daily liquidity reporting time by nearly 80%. 3. Publish, Transform • Metadata and Manage - This function stores business • The platform improved data Stored data is transformed into and technical metadata for management at the bank, actionable information, and the results data ingested, processed and becoming a one-stop shop for data are converted into a format that is easy scheduled by the platform. migration. This resulted in reduced to draw insights from. cost of storage and operations, and - It helps trace the data lineage improved data lineage. and log and track data, and • Data Publishing provides dashboards to track the • It also helped Infosys’ client save - Data ingested and stored is now feed status. $8.5 million on license renewal cleansed and organized in cloud costs for the Ab Initio software that databases or data warehouses. - Infosys’ open source data addressed real-time data processing Also, targeted tables are created management platform uses and application integration. in targeted databases based on Postgres or MySQL databases to • With its petabyte-scale transfer metadata information provided. store metadata. ability, Infosys’ open source The cloud data manager uses data management platform has Google BigQuery to interact with • Data Lineage delivered significant benefits to the and analyze huge volumes of - Data is unmasked after it is bank and helped move hundreds of stored data. ingested into on-premises locations or the cloud. The KMS applications and petabytes of data - This function is built using Java, allows only authorized users to to the cloud. Python, Cloud SDK, Cloud native unmask the data. tools and Google BigQuery. - This helps banks comply with Future of banking • Data Masking cybersecurity and regulatory with Infosys’ - Data is again masked after it requirements and helps build trust is ingested into on-premises with users that the data access open source data locations or the cloud. The KMS path is secure. management platform masks data in real time, and data can be unmasked only for • Business Glossary With Infosys’ data management authorized users. - The platform captures the business platform, banks can now migrate glossary and moves along with volumes of data back and forth • Data Profiling the data to the destinations on- quickly at scale and deploy artificial intelligence and machine learning to - Data profiling helps assess the premises and in cloud. It tags the analyze data, provide better insights quality and relationship of the business glossary for the data and and make better decisions. As a available data. Data profiling attributes ingested, and is also result, banks can begin to act truly as jobs are undertaken in the helpful for audit purposes. digital-first companies, much like the platform’s Scheduler. - Infosys’ open source data fintech competitors they increasingly management platform uses Java face in the market. As an open source • User Management and an Open Source Business solution, Infosys’ data management - The user management function Glossary Model to save the platform will benefit from the authenticates and onboards users business glossary. contributions of the open source based on an organization’s Active Directory. It also assigns roles or provides secure access to various Benefits of Infosys’ features of the platform that users open source data are authorized to work on. management platform - The data management platform • With the data management

External Document © 2021 Infosys Limited community and other banks that choose to test and use it. We hope that in the future, it will become a standard platform that enables traditional large banks to engage with the cloud at petabyte scale. References 1. “Two-platform IT: Why the mainframe still has its place in the modern enterprise,” Information Age, April 12, 2018, https://www.information-age.com/main- frame-modern-enterprise-123471427/ 2. “Infosys Digital Radar 2019: Barriers and Accelerators for Digital Transformation in the Financial Services Industry,” June 2019, https://www.infosys.com/ about/knowledge-institute/insights/Pages/financial-services-industry.aspx 3. “Banks face spiraling costs from 50-year-old IT,” Financial News, October 2, 2017, https://www.fnlondon.com/articles/banks-face-spiraling-costs-from-archa- ic-it-20170912 4. “Infosys Digital Radar 2019: Barriers and Accelerators for Digital Transformation in the Financial Services Industry,” June 2019, https://www.infosys.com/ about/knowledge-institute/insights/Pages/financial-services-industry.aspx 5. “Banks face spiraling costs from 50-year-old IT,” Financial News, October 2, 2017, https://www.fnlondon.com/articles/banks-face-spiraling-costs-from-archa- ic-it-20170912 6. “Consumers Want An Experience That Legacy Banking Systems Can’t Deliver,” The Financial Brand, April 2, 2018, https://thefinancialbrand.com/71829/legacy- systems-banking-customer-experience-impact/ 7. “AWS Case Study: Federal Home Loan Bank of Chicago,” Amazon Web Services, https://aws.amazon.com/solutions/case-studies/FHLBC/ 8. “Endless possibilities with data: Navigate from now to your next,” Infosys Knowledge Institute, November 2018, https://www.infosys.com/data-analytics/ insights/Documents/endless-possibilities-with-data.pdf 9. “Banks Kept On Their Toes By Dizzying Data Management Regulations,” PYMNTS.com, December 5, 2017, https://www.pymnts.com/news/b2b-pay- ments/2017/wolters-kluwer-bank-data-management-regulation/ 10. “Regulatory Data Management: Data Quality and Integrity Concerns for Asian Banks,” Moody’s Analytics, April 2019, https://www.moodysanalytics.com/ articles/2019/regulatory-data-management 11. “Data Management Challenges For Financial Services,” Digitalist Magazine, January 15, 2019, https://www.digitalistmag.com/customer-experi- ence/2019/01/15/data-management-challenges-for-financial-services-06195118 12. “Why aren’t banks making the most of data?,” Raconteur, June 18, 2019, https://www.raconteur.net/finance/data-management-banking 13. “Digital Banking Transformation: Redefine the Banking Core,” Infosys Knowledge Institute, April 2019,https://www.infosys.com/about/knowledge-institute/ insights/pages/digital-banking-transformation.aspx 14. “COBOL blues,” Reuters, http://fingfx.thomsonreuters.com/gfx/rngs/USA-BANKS-COBOL/010040KH18J/index.html 15. “Do You Know Cobol? If So, There Might Be a Job for You.” The Wall Street Journal, September 21, 2018, https://www.wsj.com/articles/do-you-know-cobol-if- so-there-might-be-a-job-for-you-1537550913 16. “Legacy systems are a pain in the bank,” Finextra, October 26, 2018, https://www.finextra.com/blogposting/16205/legacy-systems-are-a-pain-in-the-bank 17. “Infosys Digital Radar 2019: Barriers and Accelerators for Digital Transformation in the Financial Services Industry,” June 2019, https://www.infosys.com/ about/knowledge-institute/insights/Pages/financial-services-industry.aspx 18. “How J.P. Morgan Asset Management went from quarterly to daily releases,” TechBeacon, 2018, https://techbeacon.com/devops/how-jp-morgan-asset-man- agement-went-quarterly-daily-releases 19. “On-Demand Infrastructure on AWS Helps Capital One DevOps Teams Move Faster Than Ever,” Amazon Web Services, https://aws.amazon.com/solutions/ case-studies/capital-one-devops/ 20. “Digital Banking Transformation: Redefine the Banking Core,” Infosys Knowledge Institute, April 2019,https://www.infosys.com/about/knowledge-institute/ insights/pages/digital-banking-transformation.aspx 21. “It’s Not a Digital Transformation Without a Digital Culture,” Boston Consulting Group, April 13, 2018, https://www.bcg.com/publications/2018/not-digital- transformation-without-digital-culture.aspx 22. “Banks face spiraling costs from 50-year-old IT,” Financial News, October 2, 2017, https://www.fnlondon.com/articles/banks-face-spiraling-costs-from-archa- ic-it-20170912 23. “Banks face spiraling costs from 50-year-old IT,” Financial News, October 2, 2017, https://www.fnlondon.com/articles/banks-face-spiraling-costs-from-archa- ic-it-20170912 24. “How Monzo built a digital bank on AWS for over 500,000 customers,” Amazon Web Services, https://aws.amazon.com/solutions/case-studies/monzo/

Authors Ajay Vij Mohammad Faizan SVP & Industry Head – Financial Services Senior Manager, Client Services – Financial Services [email protected] [email protected]

Jitendra Raisinghani Sharan Bathija Principal Technology Architect – Financial Services Senior Consultant – Infosys Knowledge Institute [email protected] [email protected]

External Document © 2021 Infosys Limited About Infosys Knowledge Institute The Infosys Knowledge Institute helps industry leaders develop a deeper understanding of business and technology trends through compelling thought leadership. Our researchers and subject matter experts provide a fact base that aids decision making on critical business and technology issues. To view our research, visit Infosys Knowledge Institute at infosys.com/IKI

Infosys Cobalt is a set of services, solutions and platforms for enterprises to accelerate their cloud journey. It offers over 14,000 cloud assets, over 200 industry cloud solution blueprints and a thriving community of cloud business and technology practitioners to drive increased business value. With Infosys Cobalt, regulatory and security compliance, along with technical and financial governance comes baked into every solution delivered.

For more information, contact [email protected]

© 2021 Infosys Limited, Bengaluru, India. All Rights Reserved. Infosys believes the information in this document is accurate as of its publication date; such information is subject to change without notice. Infosys acknowledges the proprietary rights of other companies to the trademarks, product names and such other intellectual property rights mentioned in this document. Except as expressly permitted, neither this documentation nor any part of it may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, printing, photocopying, recording or otherwise, without the prior permission of Infosys Limited and/ or any named intellectual property rights holders under this document.

Infosys.com | NYSE: INFY Stay Connected