<<

The Digital Banking Blindspot Mobey Forum Report June 2021

The Digital Banking Blindspot: Emerging enhancing technologies and their role in privacy risk mitigation and business innovation

www.mobeyforum.com |[email protected] 1 The Digital Banking Blindspot

The Digital Banking Blindspot: Emerging privacy enhancing technologies and their role in privacy risk mitigation and business innovation

Contributors: Amir Tabakovic (co-chair); Experiens AI Ville Sointu (co-chair); Nordea Sebastian Reichmann; TietoEvry Romana Sachova; CaixaBank

Copyright © 2021 Mobey Forum Copyright © 2021 Mobey Forum 2 The Digital Banking Blindspot

Table of content

Welcome 4 a. The Mobey position b. Why this topic? Why now? c. Problem framing and scope Today’s landscape: How banks are approaching privacy challenges 6 a. Banco Bradesco explores loan predictive model using encrypted data b. AML collaboration between banks while protecting privacy of its internal risk scores c. WeBank improves the predictive credit risk model with federated learning d. Open source AI-generated synthetic data engine e. Product development and testing with AI-generated synthetic data How privacy protection influences the usage of today 7 a. Legal basis b. Secure processing Privacy protection techniques 9 a. Trust-based privacy enhancing methods i. Access control / Limitation of use methods ii. Encryption methods b. Obfuscation-based privacy enhancing methods i. Anonymization ii. Pseudonymization Challenges of today’s approaches 11 a. Trust hacks b. Re-identification hacks c. Consequences on Innovation Emerging Privacy Enhancing Technologies 13 a. Encrypted analysis i. Homomorphic encryption b. Anonymized computing i. Secure MPC ii. Federated learning c. High dimensional anonymization i. Representative AI-generated synthetic data ii. Differential privacy How to proceed? The way forward for banks 16

www.mobeyforum.com |[email protected] 3 The Digital Banking Blindspot

Welcome

a. The Mobey position b. Why this topic? Why now?

In today’s data-driven world, the banking industry For today’s banks, the strategic value of both data and relies too heavily on legacy data privacy enhancing AI is growing rapidly. With this in mind, Mobey Forum technologies, which bear many hidden risks and instill a has prioritized its analysis on where the two intersect: false sense of security in the institutions that use them, data privacy technologies. Data privacy compliance is a particularly in the upper echelons of management. This critical area for banks to master in order to pursue data reliance, together with the restrictions on financial data and insight driven business opportunities. The usage imposed by regulation, is inhibiting data-driven industry has a century-old history of data protection, as innovation across the industry. its relies on trust like no other industry. Operating with data at scale, without sacrificing privacy Through its situational analysis of this space, Mobey along the way, is a major challenge. To support, Mobey Forum’s new AI & Data Privacy Expert Group has has brought together a diverse group of experts with revealed a blind spot in the banking industry regarding multidisciplinary backgrounds to explore this challenge the importance of emerging privacy enhancing and the opportunities that emerging PETs offer. technologies (PETs) in privacy risk mitigation and business innovation. It contends that this blind spot In the following pages the Expert Group provides clarity is becoming increasingly critical in today’s financial regarding the common terminology around PETs, the services climate, where organisations are perpetually role of personal data protection in financial services challenged to balance innovation and multi-stakeholder and the frequently used privacy protection techniques. ecosystems without compromising how sensitive data is The report highlights the problems and associated risks stored or shared. which popular traditional privacy protection methods face today, alongside the impact on existing data driven In this report, written for decision makers and strategic practices and innovation. leadership within financial institutions worldwide, Mobey’s Expert Group highlights a new breed of c. Problem framing and scope emerging PETs that create potential for financial service organisations to both significantly reduce existing There is broad industry agreement that data has privacy risks, and allow the implementation of privacy- become one of the most important strategic assets for by-design principles. The Expert Group also provides banks, supporting competitiveness of existing business a high-level introduction into these technologies areas and providing access to new opportunities, as part before concluding that if industry can adopt a strategic of data and insight-driven business models and value approach to PETs, its key institutions may finallyescape chains. The value of this strategic asset depends on a the privacy-value creation dilemma in which privacy bank’s ability to use its data, analytically extract its utility protection occurs at the expense of innovation (and and improve decision-making abilities related to the vice versa). customer and the bank itself.

This report is the first in a two-part series. The second A large part of the relevant data for these initiatives part is anticipated later in 2021 and will take a deeper includes sensitive and personal information and is dive into the different PETs, their variable stages of therefore subject to data privacy restrictions. Along with maturity, and their potential to solve some of the most a banks’ data management and analytics capabilities, urgent privacy risks that banks face today. this raises an important question: has the mastery

Copyright © 2021 Mobey Forum 4 The Digital Banking Blindspot

of data privacy requirements become a significant From a data value chain perspective, this report competitive competence for banks? concentrates on the privacy challenges of the last two steps (see figure 1): The current climate for innovation in financial services encourages multi-stakeholder ecosystems where it is • Processing and analysing the data to generate new imperative that sensitive data is secured and under insights and knowledge (analytics) and control, even when shared with third parties (this • Putting the output to use, whether internally or by includes AML initiatives, improvement of fraud detection trading them (exchange) or credit risk models). Such data ecosystems and related use cases create a new set of requirements for data From a technological perspective, this report will privacy management. shed light on software-based PETs that focus on data analytics while excluding some techniques more Understanding that legacy PETs are not fulfilling those focused on authentication or information validation. new requirements leaves banks with two options: stall This report also won’t cover the hardware-based trusted their innovation initiatives or explore new, emerging execution environment, an area also commonly known PETs capable of fulfilling those new requirements and as confidential computing. bridging the gap between privacy and data driven value creation. This report has, in the most part, been inspired by European privacy regulation. Other regions may have This report focuses on technologies that help banks to very different legal stipulations about what is allowed to escape the privacy vs. value creation dilemma, including safeguard. usage of personal information as a component of data-driven innovation scenarios.

Data value chain Focus of the report

Generation Collection Analytics Exchange

Also jointly as data usage.

Figure 1: Data usage components in the data value chain

www.mobeyforum.com |[email protected] 5 The Digital Banking Blindspot

c. WeBank improves the predictive credit Today’s landscape: risk model with federated learning

How banks are In 2018, WeBank and a national invoice centre jointly developed a predictive credit risk model that helped approaching privacy half the number of defaults among WeBank’s small and micro-enterprises. Invoice centres were willing to work challenges with WeBank because the new approach used to build the predictive model guaranteed that they would remain the only owner and controller of their data3. This new approach is called federated learning. Across the industry, there are subtle signs that banks are trying to overcome privacy-related barriers which d. Open source AI-generated synthetic data prevent the use of customer data for innovation and engine the development of a data-driven multi-stakeholder ecosystem. Although banks are still hesitant to publicly discuss the new privacy-related technologies they are In 2020, Citi open sourced Datahub, a set of python exploring, a small number of announcements have been libraries dedicated to the production of synthetic data made in recent years, showing increased innovation and to be used in tests, machine learning training, statistical 4 appetite in this area. analysis, and other use cases . With this contribution, Citi addressed the challenges the industry is facing when anonymous data is required. This new method using AI- technology to create representative anonymous data is called AI-generated synthetic data. a. Banco Bradesco explores loan predictive model using encrypted data e. Product development and testing with AI-generated synthetic data In 2020, Banco Bradesco conducted a six month pilot using encrypted data containing a customer’s financial Erste Group uses AI-generated synthetic data in product history in order to develop a predictive model that development and testing. With the move away from determines whether a customer would need a loan traditionally anonymised data in favour of AI-generated within the following three months 1. The data used in synthetic data, the bank’s objective is to develop and the pilot was not decrypted during the machine learning test their services in a much more sophisticated manner. process which resulted in the final predictive model. Erste Group sees synthetic data as the foundation for The method that enabled the application of analytics all future data-driven development as it provides the techniques on encrypted data is called homomorphic only GDPR-compliant method for unlocking advanced 5 encryption. analytics and insights based on customer data .

Although some of the described real-life applications of b. AML collaboration between banks while emerging PETs are still in the pilot phase, it is becoming protecting privacy of its internal risk scores clear that these technologies are moving into a competitive space. The benefits for the banks starting to Two Dutch banks - Rabobank and ABN AMRO - are use these technologies are numerous. By exploiting the experimenting in the area of AML with transaction potential of data that was previously too delicate to work monitoring without disclosing their internal risk scores with, banks are improving existing analytical models and to the partner bank 2. The objective is to discover when enriching the data used for analytics with their own or a high-risk customer from one bank transfers money to third party data. They also have the opportunity to build a low-risk customer without the necessity to share risk data-driven products and services on the solid privacy- scores between the banks. The technology enabling this by-design foundation that cannot be easily exploited is called secure multi-party-computation. through well-known weaknesses of legacy PETs.

1 https://ibm-research.medium.com/top-brazilian-bank-pilots-privacy-encryption-quantum-computers-cant-break-92ed2695bf14 2 https://www.abnamro.com/uk/en/news/tno-rabobank-and-abn-amro-are-working-on-privacy-friendly-data-analysis 3 https://www.digfingroup.com/webank-clustar/ 4 https://github.com/finos/datahub 5 https://www.finextra.com/pressarticle/86706/erste-group-embraces-synthetic-data-to-foster-innovation

Copyright © 2021 Mobey Forum 6 The Digital Banking Blindspot

How privacy protection influences the usage of personal data today

Five years ago, Mobey published a report entitled • Multi-stakeholder ecosystems – either inbound, “Predictive Analytics in the Financial Industry”. Since then, outbound or collective data sharing with third many predictions from this report have become a reality: parties7 – where usage happens for purposes other Data has become one of the most important strategic than those at the time of collection of the personal assets for banks and new data regulations have created data. new areas of competition6. For example: Improvement of an existing fraud detection predictive model by sharing data with In the past five years, banks have invested heavily in partners and/or competitors. improving their skills, implementing new tools and changing the processes and requirements to their data There are two privacy-related stumbling blocks for data- analysis output. This is allowing more banks to unlock based innovation – the legal basis and secure processing the untapped utility hidden in their data and transform – which impact the way in which data privacy is usually it into added value for private and corporate customers. governed within financial institutions. The commercial, scientific and social potential of financial data as an analytical resource (data utility) is enormous. a. Legal basis At the same time, a customer’s data is subject to stringent data protection legislation. Regulators around In order for a financial service provider to use personal the world have updated existing privacy regulations to data, there has to be a legal basis for this activity. There confront the realities of the digitized world (e.g., GDPR, are two major types of legal basis for the processing of CCPA, FADP). These recent developments have impacted personal data: the financial services industry where personal data protection has become a critical process. • Universal and explicitly defined by the regulator and includes contract, legal obligation, vital interest, Privacy regulations affect a variety of different data usage public task and legitimate interest. activities: • Explicit customer consent that can be withdrawn • Regular data usage (internal or external) within the by the customer at any time. Most data-driven purpose defined at the time of collection of personal innovations using customer data require a data. customer’s consent 8. For example: The processing of personal information as part of a business operation (payment transactions, CRM, etc.). The only other way of processing personal data is if personal data is anonymised. Once “personal data (is) • Internal innovation based on data controlled by rendered anonymous in such a manner that the data the organisation where processing happens for subject is not or no longer identifiable” the data isn’t purposes other than those at the time of collection beholden to privacy regulations anymore. of the personal data. For example: A product recommender application.

6 https://mobeyforum.org/predictive-analytics-financial-industry-art/ 7 https://www2.deloitte.com/content/dam/Deloitte/lu/Documents/financial-services/lu-next-generation-data-sharinging-financial-services.pdf 8 https://mobeyforum.org/privacytech-in-banking-part-i/

www.mobeyforum.com |[email protected] 7 The Digital Banking Blindspot

Customer data?

YES Do you need to link Not privacy relevant NO back to customers?

NO YES

Legal basis

Customer consent? NO "Universal" legal basis exists?

NO YES Can be withdrawn by customer at any time Game over YES Secure processing

Innovation based on anonymous data Innovation with customer consent Innovation with legal basis

Figure 2: Assessing data privacy requirements - a ‘starting point’ decision tree

Figure 2 shows three possible areas of private data- • Innovation based on anonymous data is free of driven innovation. The areas labelled as one and two are privacy regulation and only restricted on the basis within privacy-regulated areas and require legal basis for that some innovations require original personal data. data processing, while the third is outside of it. There are many use cases where anonymous data • Innovation within the legal basis as defined by the provides as much utility as original data, namely in regulator (contract, legal obligation, vital interest, software development testing, data analytics, AI training, public task, legitimate interest) means that new model governance, etc. innovation is consistent with one of the universal purposes for the usage of personal information. b. Secure processing For example: Anti-Fraud Applications where there is legitimate public interest for data processing. Assuming the legal basis is legitimate, the processing of personal data must then comply with security and • Innovation with customer consent is covered by confidentiality requirements. Security and protection explicit written consent provided by the customer. measures are a necessity for all phases of product It is critical that customer consent is given freely, development that require customer data. This is usually specific to the data usage intent, and is informed, a combination of different security measures for unambiguous, and can be revoked at any time. privacy protection involving data access management Securing customer consent for new services is very over encryption to ensure data anonymization. Data challenging, particularly if the customer data will has to be protected throughout the product life cycle be shared with third parties. It is common for a including changes to product features, product-related lack of critical mass with customer consent (opt-in) infrastructure and up/downstream applications. It is a to prevent an innovative idea from progressing. compelling argument for privacy and by Emphasis must be placed on the fact that a design, as building strong privacy and security processes customer’s consent can be revoked at any time which into a new data-driven product from the very beginning means that data governance processes have to be in allows adaptability within an evolving environment9. place to enable the deletion of such data. For example: Secure APIs are an open banking use case where customer consent is needed for third parties to access a bank’s customer data.

9 https://mobeyforum.org/privacytech-in-banking-part-i/

Copyright © 2021 Mobey Forum 8 The Digital Banking Blindspot

Privacy protection technologies

Privacy protection techniques which focus on the use of personal information within the financial services industry and beyond can be classified into four groups. These groups are based on the premises/criteria such as the level of trust in data processors or the way the data is manipulated. (see figure 3)

Privacy Enhancing Technologies

Trust-b ased PETs Obfuscation- based PETs

Access control / limitations Encryption Anonymization Pseudonymization of use

Figure 3: Privacy enhancing technologies

a decryption key is needed to process encrypted data. a. Trust-based privacy enhancing Homomorphic Encryption is an exception to this rule and methods is explained further in chapter six. Trust-based PETs presuppose a level of trust in the Although most privacy regulations do not explicitly person that is working with the original customer data. require data controllers and processors to use encryption This trust could translate into exclusive access to original methods, they strongly recommend them in order to data (access control/limitation of use) or exclusive access protect private data and to mitigate the risk of data to the encryption key (encryption). breaches. In fact, data controllers and processors can avoid penalties in the instance of a data breach if they i. Access control / limitation of use methods can prove that the impacted data was encrypted. The first group consists of frameworks and techniques that control access to personal data and the limitation of b. Obfuscation-based privacy enhancing its use. Based on data access policies, the access is only methods granted to authorised users. The access policies focus on different characteristics of entities accessing data, Obfuscation-based privacy enhancing methods including roles (e.g. administrator, power user, etc.) or manipulate original sensitive data so that it cannot key attributes (e.g. location). be re-identified in the given context of its usage. An important factor for obfuscation-based methods is the ii. Encryption methods privacy-utility trade-off, which involves weighing up the Data encryption applies mathematical algorithms to privacy and utility criteria to decide how large a privacy translate data into an unintelligible form in order to gap is acceptable in order to extract a certain amount of protect it while stored or transferred. Only people data utility. This must be considered against how large with access to a secret key (decryption key) can read of a utility gap is acceptable to ensure a certain level of it. Exposure of the secret key to an intruder and the guaranteed privacy (see figure 4). consequential breaking of encryption will lead to the complete loss of data protection. In the majority of cases,

www.mobeyforum.com |[email protected] 9 The Digital Banking Blindspot

ideal situation maximal privacy gap privacy

utility gap

PETs p r i v a c y o t e n

no privacy

maximal no utility data utility utility

Figure 4: The privacy-utility trade-off. How big a ‘privacy gap’ is acceptable?

i. Anonymization ii. Pseudonymization GDPR defines anonymous information as “…information According to GDPR, pseudonymization stands for “the which does not relate to an identified or identifiable processing of personal data in such a way that the natural person or [as] personal data rendered data can no longer be attributed to a specific data anonymous in such a manner that the data subject is not subject without the use of additional information.” or no longer identifiable”. Hence if there is no way to (re-) GDPR also states that “…data which have undergone identify the customer, the information is anonymous and pseudonymization, which could be attributed to a natural is not beholden to privacy regulations anymore. person by the use of additional information, should be considered to be information on an identifiable natural Anonymization is therefore a process of de-identifying person”. Therefore, pseudonymized data is considered to sensitive data while preserving its format and data type. be personal data and must comply with data protection Some widely known classical anonymization methods regulations such as GDPR. include randomisation, noise injection, tokenisation, suppression, shuffling, generalisation, etc. Multiple During the pseudonymization, all personally identifiable anonymization techniques are usually combined into a information (PII) including name, address or social standardised anonymization design process that results security number is identified and either removed, in obfuscated, less personal data10. In recent years, more masked, or replaced with other values. The rest of the anonymization practitioners are raising concerns about data (not direct PII) stays the same. This means that describing obfuscated personal data as ‘anonymous’, pseudonymized data still contain parts of identifiable on the basis that it creates a false sense of security and information hidden in the non-PII attribute gives companies tacit permission to share data without worrying about privacy 11.

10 https://mobeyforum.org/only-a-little-bit-re-identifiable-good-luck-with-that/ 11 https://verstaresearch.com/blog/five-best-practices-for-keeping-your-data-anonymous/

Copyright © 2021 Mobey Forum 10 The Digital Banking Blindspot

Challenges of today’s approaches

With the acceleration of digital transformation and Banks’ legacy data protection methods are data-centricity, new challenges arise. The sheer generating hidden privacy risks which cannot amount of data collected from a growing number of sources is posing a threat in itself. Privacy be easily rectified. New technologies are guarantees that were once ‘good enough’ might no needed to create a foundation for privacy- longer suffice. by-design solutions that will support future

The financial services industry invests heavily in data-led innovation. adapting its processes, organisational structure and technical infrastructure to comply with national and international data privacy regulations. Nevertheless, data privacy compliance represents an enormous challenge for financial service providers. The growing number of reports of financial service institutions that have violated local privacy regulations are a clear indicator that the industry is struggling to mitigate privacy risks.

The risks connected with the non-compliance of privacy regulation have become substantial. It can lead to direct financial impact, exemplified by BBVA which was fined EUR 5 million in December 202012 and Capital One which was fined $ 80 million in the same year in connection with a large data breach13. There are also significant reputational risks that can endanger the trust in a financial service organisation which was established over decades, if not centuries.

All privacy technologies are built on certain premises. These premises are also the entry point for adversaries and the main vulnerability of the given method. Two major vulnerabilities which banks are facing when trying to protect customer privacy are trust hacks and re-identification hacks (see figure 5).

Privacy enhancing technologies

Trust- based PETs Obfuscation- based PETs

Access control / limitations Encryption Anonymization Pseudonymization of use

Trust hacks Re-i dentification hacks

Figure 5: Mapping PET categories to hack types

12 https://www.dataguidance.com/news/spain-aepd-fines-bbva-%E2%82%AC5m-gdpr-information-and-consent 13 https://www.americanbanker.com/news/capital-one-to-pay-80m-in-connection-with-massive-data-breach

www.mobeyforum.com |[email protected] 11 The Digital Banking Blindspot

a. Trust hacks and easier – but re-identifying customers from pseudonymized data is, frankly, child’s play. With the democratisation of data analytics and the rise of data citizens14 within the enterprise, data privacy needs Example: to be enforced earlier and better than before. According An internal document of the largest financial data to Gartner, 59 % of privacy incidents originate with an broker in the US leaked to the public acknowledging organisation’s own employee base15. that the consumer payment data could be unmasked and subsequently re-identified. The leaked document Example: revealed what type of financial data the broker shares An employee of the South Africa based financial services with its business customers, how the data is managed group, Absa, was able to use their role as a credit analyst across its infrastructure, and the specific anonymization to access the group’s risk modeling process and sold the techniques used to protect the privacy of the payment personal information of 200,000 Absa customers to third cardholders. The document revealed that the data parties16. sold to third parties was only pseudonymized and not anonymized and that customers behind the sold b. Re-identification hacks transaction data could have been re-identified easily.18

i. Why size matters? c. Consequences Big data is difficult to anonymize. The bigger the volume, the trickier it gets due to combinatorial Although privacy budgets worldwide doubled in 2020 to explosion. Sequential behavioral datasets, like those of an average of $2.4 million19 many institutions are still not transaction records frequently used by retail banks, are focusing their efforts on the elimination of vulnerabilities. especially complex. Even if we remove all PII and use This is largely due to the lack of technological only five distinct transaction amounts with 20 different competencies required to assess the existing exposure transaction categories, the number of behavioral stories and accumulated data privacy risks. quickly explodes with every additional transaction. A single transaction has 20x50 = 100 possible outcomes, In defense of the data privacy team, some of the two transactions will already yield 100x100 = 10,000 processes that incorporate the above-mentioned privacy outcomes. For a sequence of three transactions, we enhancing methods are considered to be business are at a million outcomes per customer, and at forty preserving processes and cannot, therefore, be replaced recorded transactions, we will already have more easily. Over time, this exposes banks to the gradual possible outcomes than atoms in the universe. It is little accumulation of data privacy related risks. wonder, that these digital traces are highly identifying, and near impossible to obfuscate!17 When it comes to innovative projects, however, the situation changes dramatically. The known privacy risks ii. Confusing anonymization with and the vulnerabilities of established PETs negatively pseudonymization impact the success potential of data-driven innovation. A common problem in the banking industry is the Inhibiting factors include: confusion surrounding the difference between pseudenonymization and anonymization, particularly • Locked data – ‘We can’t touch this’. (internal when the former is used as a synonym for the latter. The innovation & multi-party ecosystems). generalisation of these two distinct concepts is making • Time to data – ‘It will take three to six months to pass an already complicated situation worse. Re-identifying all compliance / legal checks.’ customers from data that has been anonymized by • Mindset/culture – ‘It’s not worth trying’ and ‘they classical anonymization techniques is getting easier won’t get it anyway.’

14 https://www.accenture.com/_acnmedia/PDF-115/Accenture-Human-Impact-Data-Literacy-Latest.pdf 15 https://www.gartner.com/smarterwithgartner/call-legal-compliance-minimize-data-privacy-risk/ 16 https://www.infosecurity-magazine.com/news/bank-employee-sells-personal-data/ 17 https://www.nature.com/articles/srep01376 18 https://www.vice.com/en/article/jged4x/envestnet-yodlee-credit-card-bank-data-not-anonymous 19 Data Privacy Benchmark Study – Cisco 2021

Copyright © 2021 Mobey Forum 12 The Digital Banking Blindspot

Emerging privacy enhancing technologies

Due to human error together with the growing number of regulation and customer consent-driven use cases, the idea of privacy by design has become paramount. A new breed of emerging privacy enhancing technologies promise solutions to the privacy troubles of financial institutions. These technologies use various methods and deliver different outcomes.

PETs use different computational, mathematical and statistical approaches to extract data utility while preserving the privacy of the information. Critically, emerging PETs seek to find a space between trust and re-identification hacks by removing information about the input data on the individual level while ensuring that analysis on data can still be performed (see figure 6).

Privacy enhancing technologies

Emerging privacy enhancing technologies

Access control / Encryption Anonymization Pseudonymization limitations of use

User Decrypted Anonymized High dimensional Obfuscated Permissions Encrypted analysis Pseudonymized data management analysis computing anonymization anonymized data

Secure multi- party AI- generated Homomorphic computation syntethic data encryption Federated learning Differential privacy

Trust hacks Zero-t rust / privacy safe Re-i dentification hacks

Figure 6: Emerging PETs leverage encryption and anonymization to create a safe space for innovation between Trust Hacks and Re-identi- fication Hacks

Some of the most promising emerging PETs are introduced below. More detailed definitions of each of these techniques will be covered in the second part of this paper.

www.mobeyforum.com |[email protected] 13 The Digital Banking Blindspot

a. Encrypted analysis c. High dimensional anonymization

Until recently, it was necessary to decrypt data before High dimensional anonymization is a term used by it could be analysed or manipulated. This meant that the Expert Group to describe anonymization methods encryption couldn’t be used in some parts of the data dealing with large datasets that would otherwise be value chain. Enabling encrypted data to be analysed and difficult to anonymize. manipulated eliminates this limiting factor, together with the associated privacy risk. i. Representative AI-generated synthetic data i. Homomorphic encryption Replacing real data with fabricated data is not a new idea. Homomorphic encryption is a privacy preserving The most rudimentary way of doing it is by replacing the technology that allows third parties to process and, in data with randomly generated place holders – dummy some instances, even manipulate encrypted data without data. A slightly more sophisticated way of fabrication, ever seeing the underlying data in an unencrypted “fake data”, is performed by manually imposing some format. Thus, data can remain confidential while it is rigid business rules or correlations between attributes processed, enabling useful tasks to be performed with of the dataset. Both methods have no analytical data residing in untrusted environments. value, and the replacement data is not used to derive analytical insights. A new approach to creating data that is representative of the original dataset is to use b. Anonymized computing AI to create synthetic data that is highly statistically representative of the original data but, at the same, is Anonymized computing is a term used by the Expert time fully private. Group to describe a designated group of methods that focus on analytical process and introduce various privacy Synthetic data is AI-generated data (as opposed to features into the process. directly measured or entered data), that mimicks real-world data. This method aims to preserve the i. Secure multi-party computation statistical properties of the original data the synthetic As the name suggests, secure multi-party computation generator was trained on but provides no direct link to (MPC or SMPC) is a cryptographic technique that allows the individual data points of the original it stands in for. several different parties to jointly compute the encrypted Such data fulfills the requirement of GDPR Recital 26 data. In other words, MPC allows the joint analysis of data and is not considered to be personal information. This without sharing it. In this way the data remains protected is what makes synthetic data so interesting for many from third parties. Only the participating parties can organisations. determine who is allowed to view the outcome of the computation. ii. Differential privacy Differential privacy is a rigorous mathematical definition ii. Federated learning of privacy. In the simplest setting, consider an algorithm The federated learning concept removes the need that analyses a dataset and computes statistics about to share sensitive data in order to perform machine it. Such an algorithm is said to be differentially private if learning. Traditional machine learning approaches by looking at the output, one cannot determine whether usually try to gather data from relevant sources into or not any individual’s data was included in the original one processing environment and feed it into a single dataset. machine learning model. In contrast, federated learning advocates the use of multiple versions of a central model that are distributed to the relevant sources, where they are trained and operate locally. Only the adjustments to the model based on the local training get played back to a central version of the model that acts as a general template.

Copyright © 2021 Mobey Forum 14 The Digital Banking Blindspot

Emerging PETs Customer data? Legacy PETs

YES Do you need to link Not privacy relevant NO back to customers?

NO YES

Do you have more than a handful of data Customer consent? NO "Universal" legal basis exists? points per customer?

YES NO NO YES

High dimensional Obfuscated anonymization anonymous data Game over YES Secure Analysis?

Encrypted Analysis, YES NO Anonymized computing

Secure multi-p arty Data access All-g enerated synthetic Generalisation computation management data Aggregation Federated learning Encryption Differential privacy Preturbation Homomorphic encryption Pseudonymization

Figure 7: Privacy obligations and regulatory mandates mapped to (emerging) PETs

Privacy can quickly become a complex topic, particularly when issues are impacted by legal, business and technology factors. Peeling back the layers to understand the underlying problems can often be difficult and, thanks to the rapidly changing privacy landscape, many legacy solutions can no longer be applied to today’s challenges. The decision tree presented in figure 7 highlights some of the biggest obstacles faced by data- driven innovation. Principal among these challenges are re-identification (doing analytics on anonymized large datasets) and secure analysis (retaining privacy shields while doing analytics). In both instances, widely adopted legacy PETs are failing, increasing privacy-related risks. The net result of this failure is the stalling of both internal innovation and the creation of multi-stakeholder ecosystem use-cases. This is the current blind spot. Happily, emerging PETs are bringing forward new approaches to fill the void and clear the path to innovation. The speed at which banks can adopt these new approaches will determine their capacity to get ahead of the game in data-driven innovation.

www.mobeyforum.com |[email protected] 15 The Digital Banking Blindspot

How to proceed? The way forward for banks

The financial services industry finds itself at a crossroads. Modern data processing and AI development requires collaboration at scale and with an increasing amount of stakeholders. At the same time consumer protection driven data privacy requirements and related legislation are both becoming increasingly complex for financial institutions to navigate efficiently. How, then, can financial institutions maneuver to create new value out of data without compromising privacy?

For the industry to move forward and escape the privacy vs. value creation dilemma, in which privacy protection happens at the expense of innovation (and vice versa), a new approach is required. Emerging privacy enhancing technologies can be used to solve some of the key challenges in this space, namely: how to process anonymous and encrypted data without losing value even when no details about private data is shared.

In this report we introduced emerging PETs, outlined their purpose and how they compare to contemporary solutions like data anonymization. In the next part of this report series we will take a step further into this topic and explore some of the most promising emerging PETs, like homomorphic encryption, secure multi-party computation, federated learning, differential privacy and synthetic data. Furthermore, the report will introduce some of the most common uses for these techniques and how they could be used in practice.

Thank you for reading.

If you, the reader, would like to participate in the AI & Data Privacy Expert Group or are interested to join Mobey Forum, we would be delighted to chat. Please contact us on: [email protected].

Copyright © 2021 Mobey Forum 16 www.mobeyforum.org June 2021