Handbook Lineage Handbook 2019

@DataMgmtInsight Search Insight www.datamanagementinsight.com JUST PUBLISHED

RegTech Special Report DataManagement TradingTech FRTB Special Report

Detailed insight into the data management requirements sponsored by

@ATeamInsight for FRTB Search: A-Team Insight www.a-teaminsight.com

FRTBSpecialReport2019_A5_V2Web.indd 1 28/02/2019 13:12

This detailed guide will enable you to: • Learn more about the key requirements of FRTB and how they will affect your business; • Understand the revised risk models and what you need to do to comply; • Identify the additional data you will need to source in order to pass the rigorous new testing procedures; • Explore the data management challenges you will need to be aware of and what you can do about them; • Investigate best practice implementation and what you need to be doing in order to meet the upcoming deadline in an efficient and cost-effective manner.

Download this guide at www.datamanagementinsight.com

From A-Team Insight

FRTB_JustPublished_HouseAdA4.indd 1 12/03/2019 17:56 Data Lineage Handbook 2019 Contents

Introduction 5 Foreword 6 Overview 8 Challenges and opportunities 16 Technology solutions 26 Approaches to implementation 30 Roundtable discussion 33 Outlook 38

Editor Sales TradingTech Insight and Client Services Manager Sarah Underwood RegTech Insight (front office) Ron Wilbraham [email protected] Phil Ansley [email protected] [email protected] Production Manager A-Team Group Marketing Operations Manager Sharon Wilbraham Chief Executive Officer Leigh Hill [email protected] Angela Wilbraham [email protected] [email protected] Design Director of Event Operations Victoria Wren President & Chief Jeri-Anne McKeon [email protected] Content Officer [email protected] Andrew P. Delaney Events Content Manager Postal Address [email protected] Church Farmhouse Lorna Van Zyl Editorial [email protected] Old Salisbury Road Sarah Underwood Stapleford, Salisbury [email protected] Group Marketing Manager Wiltshire, SP3 4LN Claire Snelling RegTech Editor +44-(0)20 8090 2055 [email protected] [email protected] Lauren McAughtry [email protected] Social Media Manager www.a-teamgroup.com Jamie Icenogle www.a-teaminsight.com Sales - Data Management Insight and [email protected] www.datamanagementinsight.com RegTech Insight (mid & back office) Jo Webb [email protected] www.datamanagementinsight.com 3 From A-Team Insight Dates for the diary Regulatory Reporting • Managing Risk & Regulatory Change • MiFID II Leveraging Innovations Using: Ai, , Blockchain Fintech • KYC, AML & Financial Crime • Trade Surveillance

LONDON New YORK October NOVEMBER 3 14

Interested in speaking? [email protected] Interested in sponsoring? [email protected]

@regtechinsight Search: RegTech Insight

From ATeam Insight www.RegTechInsight.com From A-Team Insight Data Lineage Handbook 2019 Introduction

An extensive guide to the challenges and opportunities of data lineage

Welcome to our latest handbook on data lineage, a response to growing interest in the business outcomes firms are seeing as a result of successful implementations, and also of the operational gains that can be made in terms of reduced costs, increased efficiency and improved risk management. This handbook covers the complete scope of data lineage, with a view to helping you gain management buy-in and budget, decide whether to build or buy a solution, and to take a best practice approach to deployment. It also considers suitable technology tools to build lineage tracking solutions, and the potential of innovative technologies such as cloud, machine learning and artificial intelligence to increase automation, support sustainable lineage and identify previously unseen and potentially rich data applications. A roundtable discussion covered in the handbook, hosted by A-Team Group and joined by the handbook’s sponsors, MarkLogic and Bloomberg, highlights the importance of data lineage for firms in capital markets, as well as the challenges and opportunities of implementation. The sponsors also provide recommendations on how to get data lineage right. The benefits of data lineage described in the handbook span operations, business and regulatory compliance, and touch on the real possibility of using lineage to support data- driven business strategy and uncover new business possibilities. Of course, no data management handbook dedicated to capital markets would be complete without reference to regulation, which is covered in a special section outlining regulatory requirements for data lineage and how they can be fulfilled. We will continue to update you on data lineage developments with blogs on Data Management Insight, one of three technology channels for capital markets participants available on our brand new A-TeamInsight website – www.a-teaminsight.com. You will also be able to find out more about data lineage by registering for A-Team webinars and events, and signing up for our weekly newsletters, all of which you can do via the new website. Finally, thank you to MarkLogic and Bloomberg for sponsoring this handbook. We hope it will be a useful guide to delivering a successful data lineage programme at your organisation.

Angela Wilbraham CEO A-Team Group

www.datamanagementinsight.com 5 From A-Team Insight Data Lineage Handbook 2019 Foreword

Lineage by Default

by Giles Nelson, Chief Technology Officer, MarkLogic

If you are betting on a data-driven future, you need to be able to understand the flow of data. That is data lineage – the story of what happens to data over time.

Organisations have come to recognise that data is one of their key assets. It’s not just a component of planning and tactics, but of business value in itself. Understanding the value of data leads to new ways of interacting with customers and partners, new products, and new ways of doing business.

Data lineage is powering some businesses today and will be powering all data-driven businesses of the future. With data lineage, you can understand where your data comes from, how it is being used, how it is being changed, and where it’s going. You can’t evolve your business, nor the way it uses data, without understanding the story of that data.

For example, consider a request from a regulator on trades and why those trades were made. Understanding the full history of each trade as well as the history of the information that led to the trade is vital to efficient and effective compliance that can scale. Data lineage gives this history and the causation.

Choices in technology are important. A traditional relational is great at storing the connections between the current versions of records, especially if they follow a strict schema, but the history of a record is lost every time there’s an update. Records aren’t just connected to each other. Many times, an individual record can be the result of merging multiple data sources, which means that you can’t assume that an individual piece of data has a single, easily tracked audit trail that proves its provenance and tells its story.

6 www.datamanagementinsight.com From A-Team Insight Data Lineage Handbook 2019

Because of this complication, data lineage can’t just be tacked on to an existing database. Data lineage isn’t just a record of changes, but a deep history that records the origins of records, reasons behind changes, and information about the trustworthiness and completeness of a record’s sources.

In the future, it is paramount that data lineage is built into applications from the start. For any application that uses data in a meaningful way, data lineage needs to be a part of the project from requirements to design and through to deployment. It is the same lesson we learned about application security 10 years ago. Data lineage requires a better approach to data integration and new patterns to record changes and usage.

It is clear that regulations, whether to protect personal data or make trading more transparent, are requiring greater levels of traceability and auditability. The regulatory landscape will only continue to become more complex and demanding. Governance of customer information, financial trade records and data usage threatens to become a crippling burden unless technology can keep pace. Data lineage features heavily in meeting these regulatory requirements and, in fact, can be part of promising paths towards making compliance easier (believe it or not).

The discussion is just starting around digital regulation and the automation of governance and oversight. If governance could adapt to new regulations automatically, compliance would not take as much effort and risk as it frequently does today.

We are keeping an eye on digital regulation. It could be the answer to the challenge of compliance in our evolving world, and pervasive data lineage would be the key.

www.datamanagementinsight.com 7 From A-Team Insight Data Lineage Handbook 2019 Overview

Data lineage traces data from 2016, a Basel Committee on source to destination, noting Banking Supervision (BCBS) every move the data makes rule designed to improve data and taking into account any aggregation and reporting changes to the data during its across financial markets, as journey. Over recent years, it well as accountability for data. has become a critical concern This required improvements and challenge for data in data governance and data managers working in capital lineage that have since been markets as it is instrumental in reinforced by other regulations regulatory compliance, provides and financial institutions’ operational transparency to recognition of the importance reduce risk and cost, and is of accurate, complete and important to managing data for sustainable data lineage. . It also supports the increasingly favoured ‘data first’ What is data lineage? approach to managing data for Data lineage covers the lifecycle the benefit of the business. of data, from its origins, through what happens to the data when Initially implemented without it is processed by different specific regulatory requirements systems, and where it moves to track data across individual from and to over time. It can be data management projects, applied to most types of data data lineage rose to prominence and systems, and is particularly following the implementation valuable in complex, high of BCBS 239 in January volume data environments. It is also a key element of data governance, providing an FIGI understanding of where data Knowing the lineage or having accurately recorded history of changes comes from, how systems to data is critical to the successful operations of a firm. The Financial Instrument Global Identifier, or FIGI, is an important component in the process the data, how it is used identification framework. The FIGI can help standardize the process and and by whom. It also plays well overcome some of the hurdles in tracing data lineage in a cost-effective into improving . manner. Go to OpenFIGI.com today to learn more. Data lineage is sometimes referred to as technical lineage, which represents the flow of physical data through

8 www.datamanagementinsight.com From A-Team Insight Data Lineage Handbook 2019

underlying applications, Data lineage is usually services and data stores, represented visually to show the or business lineage, which movement of data from source requires the same underlying to destination, changes to the technicalities but is perceived as data and how it is transformed a driver of by processes or users as it and better business decisions. moves from one system to

By building a picture of how data flows through an Scope of data lineage implementation organisation and is transformed is often determined by regulatory from source to destination, it requirements, enterprise data is possible to create complete audit trails of data points, management strategy, data impact an aspect of lineage that and critical data elements has become increasingly necessary to meeting regulatory another across an enterprise, requirements and ensuring data and how it splits or converges integrity for the business. after each move. Visualisation can demonstrate data lineage While data lineage helps to at different levels of granularity, track data and identify different perhaps at a high level providing processes involved in the data data lineage that shows what flow and their dependencies, systems data interacts with metadata management – the before it reaches destination. management of data that As the granularity increases, describes data – is key to it is possible to provide detail capturing enterprise data flow around the particular data such and presenting data lineage. as its attributes and the quality Data lineage solutions based on of the data at specific points in metadata collect and integrate the data lineage. consistent end-to-end metadata throughout an organisation, The scope of lineage and create a metadata implementation is often repository that is accessible determined by regulatory and makes complete data requirements, enterprise lineage information available to data management strategy, different user groups. data impact and critical data

www.datamanagementinsight.com 9 From A-Team Insight Data Lineage Handbook 2019

elements of an organisation. It is Financial Instruments Directive not necessary to boil the ocean, II (MiFID II), General Data but instead identify regulatory Protection Regulation (GDPR), requirements for data lineage Fundamental Review of the and business areas to which its Trading Book (FRTB) and the application is beneficial. Comprehensive Capital Analysis and Review (CCAR) – now In many financial firms, users of require firms to implement data data lineage include business lineage to demonstrate exactly managers and analysts, how they came to the results compliance professionals, published in reports. Using data strategy developers, data lineage, firms can not only prove governance teams, data the accuracy of results, but also modellers, and IT management, take a proactive approach to development and support. identifying and fixing any gaps in required data. Importance of data lineage Data lineage is critical to both Complete data lineage can also regulatory compliance and reduce the burden of regulation business opportunity. by providing operational transparency and reducing risk and costs. Its metadata can help firms consolidate regulatory From a business perspective, and at reporting by identifying data a base level, data lineage helps firms that is used across numerous stay on the right side of regulators and regulations and move towards avoid the penalties of non-compliance processing the data once for multiple purposes. Similarly, From a regulatory perspective, metadata for data lineage can compliance has been tightened ease the burden and cost of up considerably since the 2008 implementing new regulations. financial crisis with subsequent regulations been designed From a business perspective, to avoid a repeat of similar and at a base level, data lineage circumstances. Rather than helps firms stay on the right merely producing reports for side of regulators and avoid the compliance, these regulations – penalties of non-compliance. including BCBS 239, Markets in Equally important, it helps

10 www.datamanagementinsight.com From A-Team Insight Data Lineage Handbook 2019

firms gain an understanding of times and times of crisis. To their data and the impact on achieve compliance, banks data of any changes to strategy, must capture risk data across systems and processes. With an the organisation, establish understanding of data, firms can consistent data taxonomies, gain the benefits of data lineage and store data in a way that beyond compliance, including makes it easily accessible and the ability to spot new business straightforward to understand. opportunities, make better decisions, increase efficiency and reduce costs. The use of data lineage for regulatory compliance is slightly different Regulatory drivers Regulations driving financial depending on the specific regulatory institution to implement data data requirement, but the overall lineage include those noted theme is the same above and detailed here. The use of data lineage in each case Data lineage requirement: Data is slightly different depending lineage must be implemented on the specific regulatory data to support risk aggregation, data requirement, but the overall accuracy and reporting. Also, theme is the same, to be able and conversely, to ensure risk to demonstrate where data data can be traced back to its originated, trace its journey origin and risk reports can be through an organisation, and defended. prove how it has been changed along the way.

BCBS 239 With the regulatory landscape continuing to evolve, your enterprise Basel Committee on Banking database needs data lineage built in. MarkLogic provides powerful data lineage that tracks the full history and provenance of all records. With Supervision Rule 239 (BCBS MarkLogic, your enterprise will be prepared for all regulatory requirements 239) came into force on January and will be prepared for the future of digital regulation. 1, 2016 and is designed to Visit MarkLogic.com to learn more improve risk data aggregation and reporting. It is based on 14 principles that underpin accurate risk aggregation and reporting in normal

www.datamanagementinsight.com 11 From A-Team Insight Data Lineage Handbook 2019

MiFID II Lineage can be used to identify Markets in Financial any gaps in trade reporting Instruments Directive II (MiFID data, and any similarities across II) is a principles based directive numerous regulatory reporting issued by the EU. It took effect obligations. It can also be used on January 3, 2018, and aims to map MiFID II reporting data to increase transparency from source systems to APAs and across Europe’s financial ARMs and vice versa. markets and ensure investor protection. The demand for GDPR reference and market data General Data Protection for both pre- and post-trade Regulation (GDPR) is an EU data transparency, including trade privacy regulation that came into force on May 25, 2018. It is designed to harmonise data Firms subject to GDPR are dependent privacy laws across Europe and on data lineage to track data and protect EU citizens’ data privacy. The requirements of GDPR provide transparency about where include gaining explicit consent it is and how it used, and allow data to process personal data, giving subjects to exercise their rights data subjects access to their personal data, ensuring data reporting and transaction portability, notifying authorities reporting, is unprecedented, and individuals of data leading to data management breaches, and giving individuals challenges including sourcing the right to be forgotten. required data, reporting in near real-time, and uploading Data lineage requirement: reference and market data Firms subject to GDPR are to MiFID II mechanisms dependent on data lineage including Approved Publication to track data and provide Arrangements (APAs) transparency about where and Approved Reporting it is and how it used. Data Mechanisms (ARMs). lineage provides the ability to demonstrate compliance with Data lineage requirement: MiFID the regulation and, from a data II operations can benefit from subject’s perspective, supports data lineage in a number of ways. access to personal data and the

12 www.datamanagementinsight.com From A-Team Insight Data Lineage Handbook 2019

execution of other rights such data aggregation required as the right to be forgotten. for the risk factor eligibility test of NMRFs, essentially FRTB the provision of at least 24 Fundamental Review of the real price observations of the Trading Book (FRTB) regulation value of the risk factor over the will take effect in 2022. It previous 12 months. is a response to the 2008 financial crisis, which exposed CCAR fundamental weaknesses The Comprehensive Capital in the design of the trading Analysis and Review (CCAR) is book regime, and focuses an annual exercise carried out on a revised internal model by the Federal Reserve to assess approach to market risk and capital requirements, a revised standardised approach, a To satisfy the demands of FRTB, shift from value at risk to an data lineage may be needed to expected shortfall measure of risk, incorporation of the track historical data and trade data risk of market illiquidity, and aggregation required for the risk reduced scope for arbitrage factor eligibility test of NMRFs between banking and trading books. Its data management whether the largest bank holding challenges include data companies (BHCs) operating sourcing, facilitating capital in the US have sufficient calculations, and gathering capital to continue operations historical data as well as real- throughout times of economic price observations for executed and financial stress, and trades, or committed quotes, have robust, forward-looking to meet requirements around capital planning processes non-modellable risk factors that account for their unique (NMRFs) and the linked risk risks. From a data management factor eligibility test. perspective, CCAR requires data sourcing, analytics and Data lineage requirement: To risk data aggregation for stress satisfy the demands of FRTB, tests designed to assess the data lineage may be needed to capital adequacy of BHCs and for track historical data and trade regulatory reporting purposes.

www.datamanagementinsight.com 13 From A-Team Insight Data Lineage Handbook 2019

to market, as have start- Few firms can claim complete and ups and young companies entirely successful data lineage, but dedicated to lineage. Some take a technical approach, most have developed a regulatory others a business approach, response that is beginning to yield but their common challenge operational and business benefits is to meet growing market demand for automated data Data lineage requirement: CCAR lineage that can cross complex requires attribute level data data environments and ensure lineage to track data from source regulatory compliance and to destination and ensure the deliver business benefit. validity and veracity of capital plans. Data lineage can also be On the demand side, recognition used to identify any data gaps in and adoption of data lineage has reporting and highlight any data tracked increasing regulation quality issues. since the financial crisis. Few firms can claim complete and Supply and demand entirely successful systems, Over the past few years, a but most have developed a number of established data regulatory response that is management vendors have beginning to yield operational brought data lineage solutions and business benefits.

How much progress has your organisation made on data lineage? We have started building a solution 38% We are in the planning stage 28% We are close to having a complete solution 19% We have a complete solution 13% We have not addressed data lineage 3% Source: A-Team Group data lineage webinar, 2018

14 Significant business benefits www.datamanagementinsight.com From A-Team Insight 58% Significant operational benefits 42% Some operational benefits 27% Some business benefits 21% No benefits 3% Data Lineage Handbook 2019

Unsurprisingly, most progress has been made at Tier 1 banks Most progress has been made at Tier 1 and other large organisations banks and other large organisations subject to extensive regulation and with the resources to subject to extensive regulation and implement data lineage, with resources to implement lineage although all firms that want to stay in the game are likely to Looking at the extent of need data lineage across some business and operational aspects of their business going benefits organisations are forward.We have startedThe results building of a a poll solution gaining, or expect to gain, from run during an A-Team Group data lineage, another poll38% noted webinar in 2018 that discussed 58% of respondents gaining or ‘HowWe are to in Get the Data planning Lineage stage Right’, expecting to gain significant showed 13% of respondents business28% benefits, 42% withWe are a completeclose to having solution, a complete 19% solutionsignificant operational benefits, close to having a complete 19%27% some operational benefits, solution, 38% starting to build and 21% some business andWe have 28% a in complete the planning solution stage. benefits. Just 3% suggested The remaining 3% had13% not yet they would gain no benefits. addressed data lineage. We have not addressed data lineage 3% What extent of business and operational benefits is your organisation gaining, or expecting to gain, from data lineage? Significant business benefits 58% Significant operational benefits 42% Some operational benefits 27% Some business benefits 21% No benefits 3% Source: A-Team Group data lineage webinar, 2018

www.datamanagementinsight.com 15 From A-Team Insight Data Lineage Handbook 2019 Challenges and opportunities

Overview Operational challenges Like most data management The operational challenges programmes, data lineage of data lineage start with includes inherent challenges winning management buy- and potential opportunities. The in and funding for a solution challenges range from winning that can be expensive, management buy-in for initial requires significant human projects to understanding input, and offers only a and tracking huge volumes modicum of advantage in of data with complex links early implementation. Poor across a environment. understanding of data lineage The opportunities range from and its potential benefits by improved data quality to better senior executives can stymie decision making and identifying approval, while the prospect of business opportunities. lengthy and complex projects could be enough to bring the Challenges shutters down. The challenges of data lineage tend to fall into three buckets Questions to consider at the – operations, technology and outset of a data lineage project data management – and while include: many are ongoing pain points for • Where are we now, why do we data managers across all sorts of need data lineage? programmes, some are specific • What extent of lineage would to data lineage. be optimal? • How can we win management buy-in? • Do we need a champion for MarkLogic’s data lineage allows a business to gain a better understanding data lineage? of its data, revealing new business opportunities and better ways to utilize • How much will it cost now and data. Your enterprise can develop new products and better ways to interact with customers and partners. This is why MarkLogic was recognized as the going forward? Best Data Lineage Solution in the 2018 Data Management Review Awards. • How much can we do with Visit MarkLogic.com to learn more allocated budget? • Do we have required skills internally? • What are the internal cultural issues of data lineage?

16 www.datamanagementinsight.com From A-Team Insight Data Lineage Handbook Regulatory Section Ad - 300 dpi.pdf 1 2/13/2019 9:03:56 AM

C

M

Y

CM

MY

CY

CMY

K Be Ready When the Regulator Calls Can you answer easily when compliance asks “Where does this data come from?” With MarkLogic, financial services firms can rest assured all source data will be available and easily accessible when needed.

WWW.MARKLOGIC.COM Data Lineage Handbook 2019

You can begin to answer these the success of data lineage questions by ensuring senior is particularly dependent on management understands the people and their approaches. importance of data and benefits It takes a range of data and of data lineage, and starting metadata management skills small. Decide whether a pilot to develop and maintain data project is going to provide insight lineage, but if data producers into business processes or and consumers don’t see its achieve an element of regulatory value, they are unlikely to fall compliance, prioritise the most in with the cause and follow important and relevant data, carefully created data lineage scope the project carefully, and processes. These producers identify stakeholders that should and consumers need to look be involved. beyond their own environment and understand how the In the first instance, it may be organisation can benefit from useful to assess where required data lineage. data comes from manually and create baseline data lineage That is not to say any data before considering automation. It lineage. As data lineage can be is also important to make sure a expensive to build and manage, pilot project is scalable and could it is important to understand include additional data or other what level of data lineage areas of the organisation before users require. Depending on making a business case. resources, it may or may not be possible to match extensive Proving the concept of data requirements, so the initial aim lineage and demonstrating must be to build a data lineage quick wins to the business solution that delivers value and should, at least in some cases, is right-sized for consumers, be enough to start the journey with later iterations providing towards a larger data lineage more detail around data and programme spanning part or all data flows. of the organisation. Data ownership and While a good start to any data accountability is an ongoing management project means challenge that many it should gain momentum, organisations with huge

18 www.datamanagementinsight.com From A-Team Insight Data Lineage Handbook 2019

amounts of data, myriad systems or solutions, to support an and applications, and little organisation’s data lineage. Early appetite among employees to implementations of data lineage take responsibility for data have were often built in-house as few failed to resolve. Data lineage vendor solutions were available, isn’t a silver bullet, but by more recently many firms have tracking data and showing how moved to hybrid in-house and it is used and by whom, it does vendor solutions, or migrated add some clarity to data and entirely to vendor solutions allows responsibility for specific as data lineage has advanced areas of data to be allocated to towards becoming a commodity. their rightful owners. Whether you plan to build Technology challenges or buy, these questions are The technology challenges of worth considering before final data lineage reflect growing decisions are made: numbers of regulations with • How much lineage is already in overlapping lineage requirements place? and smarter auditors and • To what extent will manual regulators asking for responses lineage continue to be to questions on demand. necessary? Advances in technology add • How will lineage be to the challenge, with cloud- documented? based applications and services, • How will it need to be scaled? and big data systems – not to • How will impact assessment mention emerging machine be managed? learning, artificial intelligence • What is the long-term aim for and natural language processing automation? technologies – creating a complex • Which areas of the data infrastructure. Data can be organisation will be covered managed in new and interesting and at what level in terms ways, but keeping track of it of technical and business and ensuring it can be trusted is lineage? increasingly difficult. • How will data lineage be sustained? At the heart of addressing these • What skills will be required? challenges, and a challenge in • How much will it cost? itself, is the selection of a solution,

www.datamanagementinsight.com 19 From A-Team Insight Data Lineage Handbook 2019

There are no catch-all answers mountains of spreadsheets, to these questions and few disparate systems, siloed data, organisations that will find uncharted data flows and mixed answers to all the questions in data formats. one solution, leading many to implement a combination of The potential impact of in-house developed and vendor regulatory change must also deployed solutions. be assessed, data quality considered, and manual Whatever the selected solution, processes brought into the however, it will not provide value lineage framework. in isolation. It is important to consider how data lineage and its Big data, data lakes and metadata will integrate with the repositories raise issues around rest of an organisation’s business how data is stored, tagged and metadata as this will provide rich linked to other data and systems, data and the ability to slice and while outsourced data and dice the data. Lineage also needs automated data feeds need to to run alongside an organisation’s be mined and brought into the systems development lifecycle data lineage scheme. plan to ensure it is maintained as technologies are changed. Data management questions And, of course, scalable and that need to be considered flexible technology is essential, before data lineage is not only to master growing implemented include: volumes of existing data types, • Is all the data valuable? but also to embrace additional • Is the data duplicated? datasets, alternative data, data • Is some of the data redundant? resulting from mergers and • Is the data internal? acquisitions, and data that we • Is the data external and have yet to discover. correctly licensed? • What tools are required to find Data management challenges answers to these questions? Implementing data lineage is a complex data management task Reflecting these questions, that could include huge volumes an early inventory of an of data, the creation of metadata, organisation’s data can start the multiple legacy systems, process of identifying which data

20 www.datamanagementinsight.com From A-Team Insight Data Lineage Handbook 2019

is important to the business and A better understanding of should be part of a data lineage data: This may sound simple, programme, which data can be but understanding data that left as is, and which data can be is used and stored across an scrapped. Data in legacy systems organisation can be difficult and black boxes will difficult, when it includes masses of if not impossible, to capture, internal data, several sources as will data that changes of external data, data silos continually but not consistently. and data in different formats. By applying data lineage, it is Considering the scope and scale possible for both technologists of these data management and business analysts to gain challenges, particularly in large a greater understanding of the organisations, data lineage data a company holds, where it utopia is not in sight, but there is, what it is used for, its current are tools and solutions that value and potential. With a good can break the backbone of understanding of data, it is also implementation and provide a possible to assign responsibility sturdy platform on which to build for data ownership to and maintain data lineage that individuals, departments or can provide useful and timely lines of business. information to the business. Data discovery capabilities: Opportunities Data lineage supports the The opportunities of data lineage ability to decide what data can also be divided into buckets, is important to the business in this case two buckets – and enables the data to be business and operations. accessed quickly. This is crucial to business decisions and can Business opportunities help firms remain competitive The business opportunities of and identify new business successful and sustainable data opportunities. lineage should more than justify the cost of implementation, A focus on data quality and although that can be frustratingly accuracy: Data quality is difficult to prove in the planning often an objective of data and proof of concept stage. From lineage programmes and can the ground up, they include: be a natural benefit of tracking

www.datamanagementinsight.com 21 From A-Team Insight Data Lineage Handbook 2019

data, eliminating duplicate or organisation’s data and redundant data, and improving providing access to trusted data reliability. Alternatively, data data quickly and efficiently, quality can be built into data data lineage allows the lineage by monitoring, checking business to make smarter, and improving the accuracy and faster and better informed consistency of specific datasets decisions. Decisions can be that must be of high quality. The made more proactively where audit element of lineage helps there is data lineage and data managers trace errors back defended on the basis of being to their source, fix any problems able to determine the exact and improve data accuracy on data underlying any decision. an ongoing basis. Identifying business Improved data reliability: By opportunities: A better tracking data from its origin to understanding of data and its destination, data lineage can the visualisation of data identify any gaps in the data that and processes, allows need to be filled, which is often organisations to identify new done manually, and reduce the business opportunities, such data remediation cycle. Lineage as the potential to create can also point to and eliminate new products by combining data duplication. First phase certain data and processes, implementation begins the or the possibility of finding an journey towards improved data external partner to upscale reliability, which will continue as and commercialise specific data lineage matures. datasets.

More accurate analytics: More Business intelligence and change reliable and better quality data management: The ability of that is understood and easily data lineage to expose an accessible supports improved organisation’s data lends itself analytics and the knock-on effect well to business intelligence and of better business decisions. change management. What-if analyses can be made using Better business decisions: existing data and processes, By supporting a better starter projects can be understanding of an undertaken to predict outcomes

22 www.datamanagementinsight.com From A-Team Insight The only global open data standard enabling effective data management. Financial instrument global identifier*

Only through an open, shared framework will the financial industry be able to finally address core data quality and lineage issues across legacy codes and standards.

90% of firms globally using more than one existing standard to identify instruments. 48% of firms pointing to incorrect or incomplete instrument identification as cause for a growing percentage of operational errors. 86% of firms agree that an open, shared framework that can establish relationships between different existing legacy instrument identifiers is needed.

Streamline your trading workflow and reduce operational risk. For inquiries regarding FIGI integration: [email protected] OpenFIGI.com FIGI

Source: TABB Group Research, April 2018 * FIGI is provided under the MIT Open Source license, with Governance by the Financial Domain Task Data Lineage Handbook 2019

of change, and favourable Data lineage can also support projects can be developed data harmonisation across quickly using existing and new multiple regulations by resources. Rather than calling discovering common data on IT to build new systems that can be generated once from scratch, the business can for use by several regulations. discover how new commercial On this basis, and by reducing concepts could work before duplication of effort, when a investing in systems. new regulation is introduced, it is also possible to establish Gains in operations what part of the required data The operational gains of data is already being governed lineage are as significant as and documented by another the business opportunities. regulation. Some are purely operational, others offer both operational Increased efficiency: By and business benefits. Among eliminating duplicated data and them are: redundant data and systems, and providing a clear view of data Improved data governance: and how it changes and moves Data lineage is an important around an organisation, data component of data governance lineage can provide increased and when implemented operational efficiency that can successfully can support the role support both cost reduction and of governance in managing the business needs for fast access to availability, quality and security trusted data. of data across an organisation. Impact assessment: Data lineage More responsive regulatory can be used to study how compliance: By creating changes to an organisation’s complete and trusted datasets systems infrastructure, business with transparency and a full processes and/or data could audit trail, data lineage eases affect specific products or the burden of gathering the right reports downstream. data for regulatory reporting and helps firms defend decisions Reduced risk: By collecting large when challenged by regulators. amounts of data, organisations

24 www.datamanagementinsight.com From A-Team Insight Data Lineage Handbook 2019

expose themselves to regulatory licensed more than once in any and business liabilities around single organisation or not used data breaches and the disclosure to any great extent. It can help of sensitive data. Data lineage firms avoid the penalties of using can reduce the amount of data unlicensed data and renew held by a company, improve data licenses with data vendors to management and understanding make external data provision of the data, and provide an audit more efficient and cost effective. trail that should help firms avoid the penalties associated with data The costs and length of breaches and wrongful disclosure. transformation programmes and Visualisation of data lineage smaller business change projects allows organisations to identify could be reduced by data lineage key risks in the data cycle and and data discovery that identify check if controls are in place or data and processes that can be need to be improved. reused. Similarly, data lineage can support and reduce the Cost reduction: While often costs of data modernisation and expensive to implement, data migration programmes. lineage offers a number of ways to reduce costs. The need to review Data ownership: With a data across an organisation as a hodgepodge of data and first step towards successful data numerous data silos, firms find lineage allows firms to identify it difficult, if not impossible, and delete duplicated data, to assign data ownership and focus on data silos and decide accountability to individuals, their fate, and discover unused departments or lines of business data that can be eradicated as and make it stick. Data lineage well as redundant systems that goes some way to solving the can be switched off. This will problem by clarifying where optimise a firm’s data footprint data is, who uses it and what and reduce the challenges and for. This allow data ownership to costs of data management. be handed over to the relevant individual, department or Data lineage processes also line of business that can best provide an opportunity to review exploit the data for financial or licensed data, which may be operational gain.

www.datamanagementinsight.com 25 From A-Team Insight Data Lineage Handbook 2019 Technology solutions

Overview approach that maps data across While most data lineage the IT landscape from source projects in financial firms to destination and uses generic start as in-house manual tools such as Microsoft Excel developments responding to and PowerPoint to maintain and regulatory requirements, times visualise lineage. These tools are changing. An increasingly have limitations when it comes regulated environment, growing to scaling, automating and volumes of data, complex data ensuring the accuracy of data infrastructures and the need to lineage, although they are still react quickly to changes and in use. provide fast access to business data are driving firms towards At the other end of the spectrum a mix of in-house and vendor, is fully automated, zero gap or purely vendor, technology data lineage supported by solutions. These come in varying vendor solutions, although in types from enterprise solutions practice, many firms still need to cloud based services, their hybrid solutions covering both commonality being in bringing manual and automated lineage, automation and increased and including both in-house timeliness, flexibility and and vendor elements. accuracy to previously manual data lineage processes. A poll question asked during a recent A-Team Group webinar Build or buy? on data lineage showed 42% of The answer to the question respondents using in-house and of whether to build or buy a vendor solutions, 29% building data lineage solution depends in house, 26% using vendor on the particularities of an enterprise solutions, 26% using organisation, whether it is large consultancy support, and 10% or small, has already built some using vendor managed solutions. automated lineage in house or is working with predominantly Underlying technologies manual systems, and what target Today, most data lineage outcomes it is aiming to achieve. solutions are coupled to traditional relational Early data lineage developments and data warehouses, and often took an in-house, manual include automation tools, data

26 www.datamanagementinsight.com From A-Team Insight Data Lineage Handbook 2019 Technology solutions

management, visualisation intelligence (AI) and graph and, increasingly, cloud databases. Blockchain technologies. The combination technology is a potential of data lineage and relational candidate for data lineage as it databases can work well, but provides both consensus on the it can be limiting in large and most recent version of golden complex environments with copy data and an immutable huge amounts of data that must historical record of data. be traced, but are constantly changing. An alternative Machine learning has a part to solution here is a multi-model play by learning and repeating NoSQL database that has required actions within data a flexible schema, efficient lineage, such as automatically processing and storage of both tagging private or sensitive structured and unstructured data elements and visualising data, and the ability to support these. AI will go further, perhaps high performance queries in a replicating data lineage large environment. processes and identifying new business propositions without Metadata is another underlying human intervention. component of data lineage. It can be collected from enterprise Graph database technology is data flow and integrated in also a good match for lineage a metadata repository. Data as it is relatively easy to model lineage is presented through data flows in a graph, data a metadata abstraction layer relationships can be queried in that allows data lineage real time, and a graph schema information to be consumed by different user groups, perhaps MarkLogic provides a powerful alternative to traditional relational databases business users, IT managers, and clumsy solutions like data lakes. With MarkLogic’s powerful multi-model database, semantic relational models can eliminate the barriers of data silos business analysts and the data while providing a powerful data lineage solution. Put your data to work to governance team. find new opportunities, develop new products, and connect better with your customers. Visit MarkLogic.com to learn more Innovative technology options supporting data lineage and making their way into the market include blockchain, machine learning, artificial

www.datamanagementinsight.com 27 From A-Team Insight Data Lineage Handbook 2019

can evolve to accommodate for different data categories, new data and relationships. such as reference data or trade The database’s query language data, and understand the data’s can be used to understand corresponding attributes. what data is used by whom, and which systems and reports Data capture often includes would be impacted by a the capture of business logic change in a particular process. and/or metadata that can be Graph visualisation can help stored in a repository and used technology and business users to create source to target data investigate data lineage. lineage, eliminate duplicated or redundant data, and provide business and technical users Automated data lineage solutions with the ability to locate, offer many benefits, including the understand, and manage ability to trace data errors, identify information that supports discrepancies and missing data, and business operations. control access to information In terms of data quality, an organisation’s critical data Automation elements can be identified in A typical automation solution data lineage and data quality for data lineage includes checks can be established across functionality that captures and the organisation as part of a data documents data flows, such as governance strategy. flows of financial instruments, from the data source to their These types of automated final destination, perhaps a solutions offer many benefits, regulatory or internal report. including the ability to trace data Drilldown functionality allows errors, identify discrepancies, particular points in the lineage control access to information to be inspected more closely, and model what would happen while traceability and audit if a new process or department ensure it is possible to track were added to the business. the journey of a piece of data They can also reduce time spent though across an organisation on validating data accuracy and and verify its accuracy. Filtering put trusted information in the capabilities allow users to filter hands of decision makers. As yet,

28 www.datamanagementinsight.com From A-Team Insight Data Lineage Handbook 2019

however, they do not provide but the key difference, for the 100% automation, but can reach moment at least, is delivery, levels of around 70-80%. with some vendors providing cloud-based solutions or Visualisation managed services that can Another important facet of be up and running relatively data lineage is visualisation quickly, and others offering technology, which can provide enterprise software solutions a real-time view of data moving that need to be implemented through an organisation’s and maintained in-house. Going processes and systems, improve forward, however, data lineage the understanding of data, is likely to follow the steady highlight any defects in data flow of data, applications and flows, and visualise the impact analytics into the cloud. of any changes to regulatory or business data or systems on upstream and downstream systems. Ultimately, it can help firms determine operational improvements and identify potential business opportunities. Automated data lineage and visualisation tools should be made available to all stakeholders in a lineage programme.

Vendor approachesroaches Vendor solutions cover similar data lineage functionality in terms of capturing data and creating lineage that makes trusted data accessible to business users and can accommodate changes on the fly. There may be slight differences in underlying technologies, scope and potential for automation,

www.datamanagementinsight.com 29 From A-Team Insight Data Lineage Handbook 2019 Approaches to implementation

Overview or both, and whether the Best practice approaches to data lineage should cover data lineage implementation one data flow, many flows or initially driven by regulatory the organisation’s entire IT compliance requirements are landscape. It is also important evolving as more firms adopt to determine whether a project the discipline for not only will cover lineage for data regulatory, but also business elements, information assets or reasons, as technology tools databases, although these can support increasing automation, be combined. and as successful programmes result in significant benefits. Early planning must also detail how metadata for data lineage will be included in Start with a small pilot project with the organisation’s metadata a well defined scope that will have repository and integrated with a relatively large impact on the business metadata to deliver organisation. The project needs to maximum benefits. demonstrate scalability and true Structured and unstructured business value data: A decision must also be taken on whether to include Not all best practice structured and unstructured approaches fit all financial data in a lineage project. firms, but here are some While structured data lineage guidelines that should be is relatively straightforward, considered ahead of any data perhaps using a handful lineage project. of attributes that uniquely identify each piece of data, Planning and preparation: Early unstructured data is not linear planning and preparation are and may require critical data key to data lineage to ensure elements to be identified and projects stay within scope and included in lineage. budget, and are delivered to an expected specification. Start small: Start with a small Consider the scope of the pilot project with a well-defined effort, whether the drivers are scope that will have a relatively about regulation or business, large impact on the organisation.

30 www.datamanagementinsight.com From A-Team Insight Data Lineage Handbook 2019

This project may take a manual must be decided and it is then approach as a starter to possible to map data flows demonstrating the potential from source to end point. of data lineage automation. It also needs to show scalability, Ensure the data lineage is efficiency and true business documented accurately and value to win management buy-in completely, and that the and funding. documentation can be updated easily when changes are made Understand required to data, systems and flows. This skills: Assess whether your will sustain lineage, but can be organisation has sufficient difficult in a large or dynamic resources to implement environment, suggesting and maintain a data lineage development of at least some project or whether it must look automated lineage. externally for help. A project is likely to need a champion and an implementation team, as Make an inventory of all data, circle well as business involvement, data specific to the area to which metadata experts, data lineage is being applied and discover custodians and stewards. how the data originates and moves Manual data lineage: Consider between people, processes, services the scope of the project, its and products objectives, and the elements in the dataflows that will be Automated data lineage: Early covered, such as data sources, planning and preparation systems that aggregate or for automated data lineage calculate data, and reporting follow similar steps to manual tools. Make an inventory of all lineage. With scope, flows and data, circle data specific to the data granularity established, area to which lineage is being an automation tool can applied and discover how the be implemented to gather data originates and moves required data and/or metadata between people, processes, and demonstrate data flows. services and products. The These tools should be able to level of data granularity that react quickly to any changes will be included in lineage to data or systems, verify the

www.datamanagementinsight.com 31 From A-Team Insight Data Lineage Handbook 2019 Roundtable discussion

origins of data, and help firms identify any missing data, data quality issues that need to be addressed, and any other weaknesses in the lineage.

Early planning and preparation for automated data lineage requires scope, flows and data granularity to be established, and an automation tool to be selected to gather data

As mentioned previously, automation is unlikely to provide a complete solution to data lineage, with experts suggesting 70% to 80% is closer to the mark. The remaining portion will need manual intervention, which is also important to monitoring the automated process to ensure it is working correctly, managing exceptions and responding to alerts raised by the solution.

Software development lifecycle (SDLC): To ensure data lineage remains accurate, efficient, accessible and sustainable, it is important to consider how it is going to be included in an organisation’s SDLC. Lineage must also be documented to support further systems and software development.

32 www.datamanagementinsight.com From A-Team Insight Data Lineage Handbook 2019 Roundtable discussion

To discover more about vendor from our open data in OpenFIGI views on data lineage in capital and our Legal Entity Identifier markets, how best lineage can (LEI) Local Operating Unit be implemented, technology (LOU) to the data Bloomberg options, and the benefits of customers rely on daily to successful deployment, we hosted provide the most reliable a roundtable discussion with the information in the industry so key sponsors of this handbook, they can trade, execute and MarkLogic and Bloomberg. We manage their risk globally. also asked them to make some recommendations to help you get A-Team: How important data lineage right. is data lineage to capital markets participants? A-Team: What is your company’s interest in data Bloomberg: Lineage is one lineage? of the pillars supporting cleanliness, usability and MarkLogic: As a leading accuracy of data. More and operational and transactional more, users are understanding enterprise NoSQL database that context is critical to provider, MarkLogic has built understanding if a piece or data lineage into its platform set of data is appropriate for from the ground up. Original a particular use – lineage has data is imported ‘as is’ and a key role in determining that. can then be progressively Where data has originated from, transformed using ‘envelope’ why it was originally created, patterns to enable and maintain and what changes have been data lineage. The original data made to it over time, and by is stored, and transformations who, all play into the question are kept as metadata. The entire of whether the data, regardless lifecycle of every piece of data of the label applied to it, can therefore be replayed and actually fits the context and examined. Our customers rely on definition you need. our technology and built-in data lineage across many use cases. MarkLogic: Data lineage is critically important. Regulators Bloomberg: Data is core to are requesting more and everything Bloomberg does, more information from capital

www.datamanagementinsight.com 33 From A-Team Insight Data Lineage Handbook 2019

market firms related to or approach, but to really get trading activities. Markets in it right, you need to marry Financial Instruments Directive in disciplines that many in II (MiFID II) is a good example financial services are less of this, where the Financial familiar with, and where there Conduct Authority (FCA) is is a lack of expertise. Notably, requiring capital market firms ontologies and semantic views to provide greater transparency have gained traction over the into trade execution and past few years, but we are still reconstruct specific trades. missing a basic understanding This is very difficult to do of linguistic issues that need to without data lineage. underpin those approaches. I’m referring to context – without an A-Team: What are the key understanding of how different challenges of implementation? communities in financial services define similar terms, semantic MarkLogic: Implementation implementations may not take challenges are usually a result into account inputs that have of poor technology choices and fundamentally changed the data governance approaches, data over time. Lineage is not a or both. Enterprises that use simple audit trail, and managing relational or single-model metadata about lineage can database systems struggle become very complex in a short with data lineage due to the amount of time. disconnected nature of the resulting data architecture. A-Team: What types of Only with the ability to handle technology can enhance multiple data models, and keep data lineage? all original and transformed data as metadata, can enterprises Bloomberg: It is less about the improve data governance and technology and more about the efficiently handle changing concepts and methodology. business requirements. Understanding that data is not static – both from a perspective Bloomberg: Put simply, that data changes over time, implementation is difficult. as well as how or why you are You can be successful just looking at a piece of data can using a technological solution change its meaning or intent –

34 www.datamanagementinsight.com From A-Team Insight Data Lineage Handbook 2019

is key to changing the way we burden unless technology approach technical solutions in can keep pace. Data lineage the first place. Yes, ontologies features heavily in meeting such as RDF/OWL and the like are these regulatory requirements, useful, but if you view the data as and in fact, can be a part of existing only in a single context, promising paths towards making you will possibly ignore alternate compliance easier. representations, misrepresent or simplify concepts, or create a Bloomberg: Again, the simple massive model that is unwieldy answer is better information and ultimately not useful. and, therefore, better decisions. For many general, broad-based MarkLogic: From our point of macro factors, unclean data is view, data lineage should be a usually mitigated by the volume core function of the database. If of cleaner data. But as you get it is not, or if you have multiple to more specific decisions, disconnected databases, then looking at the trees as opposed your ability to trace and audit to the forest, so to speak, details data across your enterprise become important. Differences will be a major challenge, and between the trees in the forest your applications will suffer the – their makeup and lineage – consequences. become vastly more important if you are picking one over the A-Team: What are the benefits other to build with. of successful data lineage? A-Team: Finally, what MarkLogic: It is clear that recommendations would you regulations, whether to protect make to help firms get data personal data or make trading lineage right? more transparent, are requiring greater levels of traceability Bloomberg: Be very clear about and auditability. The regulatory the community the data is for, landscape will only continue and if that changes throughout to become more complex and the data lifecycle. more demanding. Governance of customer information, financial Know it is OK to have the same trade records and data usage data mean different things, as threatens to become a crippling long as context is included and

www.datamanagementinsight.com 35 From A-Team Insight Data Lineage Handbook 2019

translations are available from about building in application one meaning to the other. security 10 years ago.

Solutions are not static, Overcoming these limitations lineage can branch and requires a better approach evolve – especially when you to data integration and new are not looking. patterns to record changes and usage. MarkLogic: Your choice of technology is important. A multi-model operational and transactional NoSQL database system, like MarkLogic, provides maximum flexibility in handling most complex data integration requirements, with built-in ability to trace data lineage across all data sources.

Evaluate data provenance and lineage as part of your enterprise data governance model and practices. You will not only be in a better position to address current requirements, such as those related to regulatory reporting, but also changes required for new regulatory compliance mandates.

Build data lineage into your applications from the start. For any application that uses data in a meaningful way, data lineage needs to be part of the project from requirements to design and through to deployment. It is the same lesson we learned

36 www.datamanagementinsight.com From A-Team Insight DataManagement Summit 2019 From A Team Insight

From A Team Insight Data Lineage Handbook 2019 Outlook

From a standing start in 2016, look for more functionality that data lineage took off when can be implemented quickly and regulation BCBS 239 came into updated frequently. force with the aim of averting further financial crises on the Development of data lineage scale of the 2008 debacle by for regulatory, operational improving data aggregation and and business purposes will reporting across capital markets. be coupled to technology advances including cloud, It has since grown into a critical machine learning and function for many financial artificial intelligence. These institutions as it delivers not only technologies should go some regulatory compliance, but also way to increasing automation, benefits such as fast access to supporting sustainable lineage reliable information, smarter and identifying previously analytics, better decision making, unseen and potentially rich improved impact assessment of applications of data. any changes to an organisation’s IT landscape, and ultimately, Evolving data standards, the potential to identify new metadata, semantics and business opportunities. From an ontologies will also feature operational perspective, data more prominently, easing the lineage can help firms reduce burden of the regulated and the costs and risk by eliminating regulator, integrating data lineage redundant systems and data. into operational infrastructure and weaving it into every data In coming years, data lineage dependent business strategy. will continue to be driven by regulatory requirements, which While a blue sky scenario depicts will intensify in their demands zero-gap, 100% automated data for increasingly granular data, lineage, and these are the results more detailed audit trails and we should pursue, the end game faster time to data discovery. Data beyond regulatory compliance lineage will also be pushed up the will, at least for the foreseeable agenda by escalating business future, remain the same – to interest in its significant beneficial move from defensive to offensive outcomes. The build or buy data intelligence that drives balance is likely to tip in favour of better decisions and unlocks vendors as financial institutions new business opportunities.

38 www.datamanagementinsight.com From A-Team Insight Delivering business insight to financial technology executives

Blogs Events

White Papers Webinars A-Team Insight

Awards Knowledge Hub

Guides Directories

Become a member and receive weekly content updates, free content downloads and exclusive invites to our webinars and summits at: www.a-teaminsight.com/membership

@ATeamInsight Search: A-Team Insight www.a-teaminsight.com

ATeamInsight_HouseAd_Sept2018.indd 1 25/09/2018 13:54 Data Lineage Handbook Back Page Ad - 300 dpi.pdf 1 2/19/2019 9:39:54 AM

C

M

Y

CM

MY

CY

CMY

K Best Data Lineage Solution? Our Customers Know. Large global investment banks rely on MarkLogic for post-trade processing and regulatory reporting solutions that let them quickly and cost-effectively trace data lineage across systems. We’re proud to have been recognized by Data Management Review as the Best Data Lineage Solution.

WWW.MARKLOGIC.COM