Emmanuel Letouzé and Nuria Oliver Sharing is Caring Four Key Requirements for Sustainable Private Data Sharing and Use for Public Good

November 2019

Content 04 Context and questions 2 22 An example of the way 07 forward: OPAL Characteristics, benefits and limits of current private data sharing models and efforts for public good

12 Four key requirements for sharing and using private data for public good 29 3 Partner 26 Works Cited

28 Authors 30 Imprint 24 Conclusions and Recommen- dations 1. Context and questions

4

ow data are shared and used will determine to a large extent the future of democracy and human Hprogress. A first imperative is to un- derstand the key features of the ongo- ing “Data Revolution”, which is not just about data.1 The Data Revolution is fu- elled by three main factors. The first is the exponential growth in the genera- tion of digital human behavioural data, or “digital crumbs”,2 resulting from our interactions with digital devices and services, the digitisation of the physi- cal world by means of sensors, and the development of the Internet of Things (IoT).3 The second is the development of sophisticated technological and hu- man capacities to store, structure and make sense of these data, including cloud computing, high-performance and large-scale computing at an affor- dable cost, and new data-driven ma- chine-learning approaches, such as deep-learning. The third factor is the growing role and interactions of com- munities of providers, analysts, users and regulators with these new crumbs and capacities, all of whom collectively shape what is known as the data econo-

1 https://www.washingtonpost.com/blogs/post-live/ wp/2016/05/05/meet-professor-gary-king/ 2 Brockman, John, 2012. Reinventing Society in the Wake of Big Data. A conversation with Alex “Sandy” Pentland. United States: The Edge. 3 Including Internet of Things devices. In 2020, according to a recent DOMO report, 1.7MB of data will be created every second for every person on earth, contributing to a total of 40 zettabytes of data, according to IDC. DOMO INC. 2018. Data Never Sleeps 6.0. American Fork, UT: DOMO. 1. Figure 1: Evolution of the definition of Big Data Context and Community Capacities Velocity Variety

Crumbs questions Volume exhaust web sensing

circa 2010: the V‘s of Big Data now: the C‘s of Big Data 5

Source: Emmanuel Letouzé

my.4 These „three Cs“ – standing for Crumbs, The term “Data Revolution” became a Capacities and Community — are the consti- common phrase in international develop- tutive elements of a revolution that is already ment circles almost a decade ago. In changing our daily lives, businesses, govern- 2012, the UN published its first report on ment operations and societies at large.5 Whi- “Big Data for Development”,6 followed in le the economic impact of such a revolution 2013 by another UN report called “Data has been estimated to be in the hundreds of Revolution for Sustainable Development”7 millions of euros in Europe, most of this im- . That, in turn, was followed by another pact has been quantified from a business report in 2015 from a group with links to perspective, given that a large percentage of the UN, which argued that data should these massive amounts of data is collected be at the heart of public policies and and controlled by private companies. programs in support of the 2030 Sus- tainable Development Agenda and its Looking ahead, it is clear that one of the 17 Sustainable Development Goals biggest questions posed to all stakeholders, (SDGs). The UN subsequently orga- policymakers in particular, is whether and nized two editions of the “World Data how these sensitive data – and/or the insights Forum” in 2017 and in 2018 in South they contain – should be shared to maximise Africa and Dubai, while the third edition their potential for public good. In more spe- is scheduled to take place in Switzer- cific terms, what are the technological sys- land in October 2020.8 The 2018 Forum tems, governance standards, economic mo- wrapped up with the launch of the Du- dels and ethical frameworks that ought to be bai Declaration, which aims to increase considered in order to foster technological, financing for better data and statistics economic and social innovation from these for sustainable development.9 data? Shaping a market for data – unders- tood broadly as the mechanism for matching 6 Letouzé, Emmanuel. 2012. Big Data for Development: supply and demand – while ensuring that the Challenges & Opportunities. New York, NY: United Nations Publications. Data Revolution benefits societies and citi- 7 Panel Secretariat led by Dr. Homi Kharas. 2013. A New Global zens broadly, and not just large companies, Partnership: Eradicate poverty and transform economies through sustainable development. The Report of the High-Le- small fractions of society and surveillance ac- vel Panel of Eminent Persons on the Post-2015 Development tors, is of the utmost necessity. Agenda. New York, NY: United Nations Publications. 8 The third UN World Data Forum will be hosted by the Federal Statistical Office of Switzerland from Oct. 18-21, 2020, with 4 According to the European Commission, the data economy support from the Statistics Division of the UN Department of will surpass 700 billion euros by 2020 and the World Economic Economic and Social Affairs, under the guidance of the United Forum predicts that 60-70% of growth in global GDP will be due Nations Statistical Commission and the High-level Group for to “data-driven, digitally enabled networks and platforms”. World Partnership, Coordination and Capacity-Building for statistics Economic Forum. 2018. Our Shared Digital Future: Building an for the 2030 Agenda for Sustainable Development. For more Inclusive, Trustworthy and Sustainable Digital Society. Cologny/ information, please visit https://unstats.un.org/unsd/undatafo- Geneva, Switzerland: World Economic Forum. rum/index.html. 5 White Paper DPA 2015; Viktor Mayer-Schönberger and Kenneth 9 UN Editor. 2018. World Data Forum wraps up with a declara- Cukier (2013) Big Data: A Revolution That Will Transform How We tion to boost financing for data and statistics. New York, NY: Live, Work and Think. John Murray. ISBN 9781848547926. United Nations Department of Economic and Social Affairs. This interest has largely stemmed from foster financial inclusion, and so forth11. early evidence of the value of insights Dozens of scientific publications have gleaned from the computational ana- stemmed from “Data Challenges”, such lysis of our digital crumbs. Because of as those organized by Orange, Telefoni- the penetration of mobile phones in both ca and Turkcell, but also by banks such developed and developing countries,10 as BBVA. Others have resulted from bi- the majority of early publications, pilots lateral agreements or internal projects. and initiatives on this topic have relied (see Box 1). on data passively collected by the mo- bile network infrastructure for billing and These initiatives have reflected and other internal purposes (Call Detail Re- stirred a growing desire for accessing cords (CDRs) or network probes). These and analysing those new kinds of data studies have showed how analysing hu- – sometimes compared to a “new oil”12 man behaviours captured by these data- that can be extracted and refined to fuel sets could inform natural disaster and more efficient, data-driven, evidence-in- emergency response, improve trans- formed decision-making systems. The oil portation, complement official statistics, analogy may seem tired or misguided13 help fight epidemics, understand crime, (also because data are non-rivalrous estimate socio-economic development, and non-finite). However, it does serve to highlight some of the risks inherent in 10 World Bank Team led by Mishra, Deepak & Deichmann, Uwe. data, including the fact that once data are 2016. World Development Report 2016. Digital Dividends Overview. Washington, DC: International Bank for Reconstruc- leaked, they are very difficult to reclaim. tion and Development / The World Bank. Will our datafied world be as extractive as the oil industry? Will data increase power imbalances and further consolidate mo- nopolies? Will it fuel corruption, benefit autocrats and undermine democratic principles?14 Will it kill privacy? Will auto- mated decision-making and so-called 6 Box 1: Defining private data „black box algorithms“ lead to the demise of democracy as government by discus- sharing for public good sion? These fears and criticisms, many legitimate, have started giving data and data-driven algorithms a bad name. They have even led many people to push back on attempts to foster more data-driven 1 Insights or knowledge derived from or data-informed decisions, especially in data: new knowedge or understanding contexts where data literacy and culture derived from the analysis of the data are often weak.

2 Processed or analyzed data; data that Cautious optimists, such as ourselves, has been analyzed and visualized argue that these legitimate concerns must inform, rather than impede, attempts at 3 Preprocessed data: raw data that has making data a key force for human pro- been prepared for analysis gress. We believe that these new kinds of crumbs and capacities have the potential 4 Raw data: data directly captured by to help reduce pervasive limitations in hu- a sensor or service and that has not man decision-making and governance been processed – including corruption, cognitive biases, misinformation, greediness and conflict of interests – to build healthier communities. We furthermore believe that Europe is well situated to lead the way in that domain, in- In this paper, “private data” is understood as data collected and cluding by building on and promoting the controlled by private organisations, in most cases private commer- vision behind the GDPR, a vision according cial / for profit companies, about the activities of their customers. to which data subjects and data generators Examples include cell phone metadata, credit and other financial have more control over how data pertaining transactions, social media data, search engine data, mobile apps to themselves are collected, shared and data, data captured by mobile operating systems and wearable or used. IoT devices, among others. These can be processed and shared according to different levels of abstraction. Data- or evidence-informed decisi- on-support systems, where machine-lear- However, sensitive data about beneficiaries collected by not- ning algorithms might play a central role, for-profit organizations, such as UN humanitarian agencies, may also be considered private data, with different political and eco- 11 Blondel, Vincent D., Decuyper, Adeline & Krings Gautier. 2015. A survey of results on mobile phone datasets analysis. nomic sensitivities. Private data and personal data are not syno- Belgium: EPJ Data Sci. nymous since pseudonymised aggregated data may cease to be 12 The Economist. 2017. The world’s most valuable resource is no personal data but remain referred to as private data to characte- longer oil, but data. United States: The Economist. 13 García Martínez, Antonio. 2019. No, Data Is Not the New Oil. rise their origin. “For public good” refers to the objective of using San Francisco, CA: Wired insights resulting from analyses of these data to advance societal 14 O’Neil, C. 2016. Weapons of math destruction: how big data welfare, including but not limited to the achievement of the 17 Sus- increases inequality and threatens democracy, New York, NY: Crown, and Y. N. Harari. 2018. Why Technology Favors tainable Development Goals (SDGs). Tyranny. Washington, DC: The Atlantic. have been referred to by some authors as outcomes overlooks important lessons from “Human AI” systems.15 Human AI systems human history, psychology and political refer to groups or societies where collective economy. Over at least the past three cen- decisions are taken and evaluated based turies, it is difficult to argue that the greater on data that is analysed by humans and/or availability of data and information to larger AI algorithms. They seek to learn from their numbers of people has helped better dis- performance (often with input from humans tribute human knowledge and wealth. But in the form of penalties and rewards) what we also know that it usually takes more seems to produce good outcomes versus than facts to change people’s minds,16 and what does not. To function, these systems that yesterday, today and, most likely, to- need to be able to access and analyse a morrow, most individuals and institutions in wide variety of available digital crumbs positions of power will try to preserve that using state-of-the-art capacities, allowing power. This often means disregarding or communities to make better decisions for tweaking data, facts and statistics to that themselves. Human AI is consistent with, end. Consequently, if we are interested in but broader than, the „human-centric AI“ the end goal — that of improving the world approach promoted by the European Com- through data and not simply cracking the mission. The term „human-centric AI“ refers major first-mile problem of data access – to using AI in ways that reflect core human we must consider the entire spectrum of values. Human AI adds to this concept a hurdles and pathways that would allow our vision whereby humans alone, or with the data crumbs to improve our communities. help of AI systems, make decisions based We argue that this requires supporting the on evidence, encouraging what seems to deployment of sustainable systems and yield good outcomes while seeking to dis- standards, but also the development of ca- courage what doesn’t—which in some ca- pacities and incentives to access, analyse ses could include the use of AI systems and put data to use. altogether. Thus, while this paper focuses on data Today‘s world is quite far from this vi- access and sharing— the beginning of the sion. To date, there have been very few causal chain — it does so with an eye on 7 examples of real-world systems which sys- data usage and impacts — the end of the tematically leverage large-scale human be- chain — and how considerations for the lat- havioural data to make better decisions for ter impact reflections and decisions on the the public good. Indeed, many public poli- former. Likewise, while it discusses data cies run counter to what basic facts would sharing “for the public good”,17 it cannot do suggest. so in a vacuum. It must also take into ac- count data for profit activities that are part Major hurdles to data-driven public of companies’ efforts at monetising data decision making have included the diffi- they collect and control (also called „va- culties associated with giving third parties lorising“—i.e. turning these data into eco- access to these data without creating risks nomic value). It quickly becomes clear that to individual or group privacy, the lack of the question as to whether and how privat- well-defined ethical principles, potential le- ely collected and controlled data could or gal and regulatory barriers and the existen- should be shared for the public good is not ce of competing commercial interests. The merely a technical or even legal one. Rat- bulk of these data are sensitive, valuable her, it is a complex, multi-faceted issue that personal data collected and controlled by also gets deep into political, social, ethical private companies. What frameworks and and business matters. standards may allow sharing these data at scale in an ethical and privacy-preserving In this paper, we provide an overview manner? What kinds of financial models, of the current state of play in private data incentives and regulations may make data sharing for the public good. We highlight sharing for public good sustainable? And gaps and needs in addition to describing even if those data (or statistics and insights four main requirements and one specific derived from it) become available, how can example that may help current and future we ensure that public administrations have digital, datafied societies fully benefit from the appropriate capacities and culture to the immense potential that large-scale per- adopt this new method of making decisi- sonal data hold. We conclude with seven ons? After all, the public sector already has recommendations for actions that we belie- access to large amounts of data that are ve would help realise the immense poten- not touched or acted upon. tial of using privately held data for public good. These questions and observations sug- gest that there are still significant obstacles on the path from data to evidence-informed decisions and outcomes. Simply assuming that more and better data would easily and immediately lead to both better decisions 16 Kolbert, Elizabeth. 2017. Why Facts Don’t Change Our Minds. New discoveries about the human mind show the limitations of (including from governments) and better reason. New York, NY: The New Yorker. 17 We consider “data sharing for public good” to be effort to use 15 Letouzé, Emmanuel & Pentland, Alex. 2018. Towards a data for a positive impact on society. Note that also we refer to Human Artificial Intelligence for Human Development. ITU these efforts as “social good” and “public interest” interchan- Journal: ICT Discoveries. geably. 8 2. Characteristics, benefits and limits of current 9 private data sharing models and efforts for public good o date, five major private data-sha- velopment Programme (UNDP) in Moldo- ring models have been used, each va to study urban mobility in the country’s with specific characteristics, benefits capital. This form of data sharing can yield Tand limitations: impressive results and build trust, and it is relatively safe. But it is not built for scaling 1. Limited data release and tends not to be very participatory.

2. Remote access A third model is few-to-many access through Application Programming 3. Application Programming Interfaces Interfaces (APIs) and Open Algo- (APIs) and Open Algorithms 3rithms. This approach enables the use of privately held data without the data ever 4. Pre-computed indicators and synthe- leaving the premises of its owners/custod- tic data ians. This is advantageous given the high volume and dynamic nature of the data. 5. Data collaboratives or Both pre-computed indicators or richer cooperatives question-and-answer access models can be implemented by means of an API. This These five data-sharing models can be means that users can either send code organised and visualised in a graph, with and/or algorithms to a server via an API in levels of security and scalability on both order to get an answer to a query, or they axes (see Figure 2). can simply pose a question which the ser- ver is prepared to answer. This modality The first model is the limited data re- allows for sharing data-based insights rat- lease model, a “one-to-many” moda- her than unprocessed or raw data and is lity where private companies prepare a type of access enabling a few-to-many 1or “package” data and then provide one data sharing model. Examples of the use or more proprietary datasets to selected of APIs for data access include the GSMA groups — typically following some scree- Big Data for Social Good Initiative and the 10 ning and after a contract or NDA (non-disc- OPAL project, which we describe in detail losure agreement) is signed. Examples of below. APIs enable queries of pre-compu- this are the aforementioned “Data Chal- ted indicators or a more interactive, granu- lenges”. These arrangements have been lar and dynamic question-and-answer mo- – and, in some cases, remain – key to del, for which OPAL is a good example, as creating awareness, but they are costly discussed in Section 4. Box 2: Telco Data and time-consuming to organise. They also imply that datasets, which are, of course, In some cases, pre-computed in- Challenges and pseudonymised and aggregated to levels dicators are shared with interested considered safe from a privacy standpoint, parties in a one-to-one or one-to-few initiatives will be „in the wild“ for unknown periods of 4sharing model. Alternatively, synthetic time, with little to no possibility of tracking data with similar statistical properties as the Mobile network operators have been their whereabouts. Another limitation is the original data might also be shared. In this among the most active private stake- fact that in most cases, local communities case, a one-to-many sharing model could holders in this modality of data sharing. are not involved in the projects and their be adopted. Orange’s Data for Development (D4D) long-term impacts seem limited. Finally, Challenges in Ivory Coast (2013) [cite] while this access model has helped with Yet another model involves data and Senegal (2015) [cite] were among understanding the value of data for public cooperatives and data collabora- the first and most important data sharing good, it does not allow for an ongoing ana- tives, (or „spaces“) which are col- challenges. They enabled hundreds of lysis of the data over time. 5laborations between participants willing researchers to analyse aggregated and to exchange data to create public value. anonymised mobile data for a variety of The second model is through one-to- This model is typically a few-to-few / ma- public good purposes, including financial one or one-to-few contractual ag- ny-to-many model, given that any contri- inclusion, socio-economic development, reements with specific individuals buting member is able to access the data climate change, natural disasters, trans- 2or institutions (governments or NGOs), shared by others. It must be noted that portation, national statistics and public who are then allowed to analyse proprie- these approaches must rely on one of the health. Telefonica, Telecom Italia and tary data, after having signed an NDA, for other data sharing modalities for data to Turkcell also organized their own Data a particular purpose related to the social actually be shared. As such, they are less for Social Good challenges in London, good and possibly for a limited period. A of a model to facilitate data sharing and Trentino and Turkey. Vodafone has also common approach involves granting re- more of an institutional arrangement to collaborated with relevant NGOs globally, mote access to (typically) pseudonymised incentivise data sharing. Data cooperati- such as Flowminder in Ghana, while the data to these authorised parties. In some ves are especially interesting due to the Vodafone Institute has led the Digitising cases, however, access to pre-packaged vision offered by their proponents of exis- Europe initiatives for several years, which datasets with specific features is granted. ting institutions, such as credit unions, have included contributions from the Organisations like United Nations Global playing a similar societal role as that of Data-Pop Alliance. Pulse, Flowminder and Data-Pop Alliance banks, but instead of handling money, (DPA) have carried out many projects and they are handling data.18 This model ent- pilots using this model. DPA, for exam- ails consensual data pooling enabled by ple, is running pilots with the United Nati- privacy-preserving technologies, such ons Economic and Social Commission for Western Asia (UNESCWA) in Lebanon to 18 Walsh, Dylan. 2019. How credit unions could help people understand the living conditions of Syrian make the most of personal data. Boston, MA: MIT Manage- refugees and with the United Nations De- ment Sloan School. as differential privacy.19 In data cooperati- ves, “individuals would pool their personal Figure 2: Five models for the data in a single institution — just as they pool money in banks — and that institution privacy-conscientious use of would both protect the data and put it to use”. In such a data cooperative, “while mobile phone data companies would need to request per- mission to use consumer data, consumers themselves could request analytic insights from the cooperative”. Furthermore, what is shared in the data cooperative model are typically aggregated insights and not the raw or even pseudonymised data. Remote APIs Defining which modality to use depends Access Questions on a range of factors, including the techni- & Answer cal capacities of both the user and the data Data owners/custodians, the level of granularity Security Collaboratives or needed, the type of data being shared and & Privacy Cooperatives the expected use for the data, among many other considerations. While the modalities Limited Precomputed mentioned above provide a broad picture Release Indicators of data-sharing models, new models will Synthetic Data emerge to make data sharing more acces- sible – in terms of frequency and ease of sharing small and large volumes of data – as stakeholders increasingly recognise the potential that bridging institutional data-si- los may hold for the public good. A recent Scalability and potentially emerging sharing modality, 11 proposed by the Open Data Institute (ODI) in the UK, is the concept of data trusts.20 Adapted from https://www.nature.com/articles/ sdata2018286/figures/1 According to the ODI, a data trust is “a legal structure that provides independent stewardship of data”. Data trusts allow peo- demia, NGOs and industry.21 The HLEG ple or organisations that generate data to is expected to produce a report with key give some control over their data to a new recommendations regarding data sharing entity (the trust) so it can be used to provide from private companies to public stakehol- services or yield benefits to themselves or ders by early 2020.22 Additionally, an inter- to others. Precisely how the data trusts will national workshop exploring private data access and transfer the data from where it is sharing also took place in September 2019 being generated has yet to be defined. Data at Eurostat, whose teams are developing a trusts can be used for data governance, sin- methodological and technical framework ce they can be responsible for stewarding, to derive statistics from data generated by maintaining and managing how data is used mobile operators.23 and shared (who has access to it and for which purposes). As part of a UK govern- Yet, while there is significantly higher ment-funded research program, the ODI public awareness of the opportunities that recently carried out three data-trust pilots, private-to-public data-sharing frameworks which ran from December 2018 to March could unlock, to date there are no common 2019 and focused on three use cases of standards, viable financial models, consis- public good: improving public services in tent regulations, multi-stakeholder coordi- Greenwich, tackling the illegal wildlife trade nation or ambitious multi-year projects to and reducing food waste. The results of the- fully realise such opportunities. Solutions se pilots have helped the ODI define their re- seem to be in sight, but significant work is commendations for the design and develop- required to assemble all the pieces of the ment of a data trust in the city of London. puzzle.

This and many more pilots exploring In this context, we describe four key modalities for data-sharing are taking pla- requirements that we believe must inform ce in Europe, where interest in sharing data European efforts to ensure that private data across stakeholders is especially high. In are shared and used for the public good fall 2018, for example, the European Com- in a safe, ethical and sustainable manner. mission created a High-Level Expert Group (HLEG) on Business-to-Government Data 21 European Commission. 2018. Commission appoints Expert Group on Business-to-Government Data Sharing. European Sharing, with representatives from aca- Union. 22 One of the authors of this paper, Dr. Nuria Oliver, is a member 19 Dwork, Cynthia and Roth, Aaron. 2014. The algorithmic of the High-Level Expert Group on Business to Government foundations of differential privacy. Foundations and Trends in Data Sharing at the European Commission. Theoretical Computer Science. Philadelphia, PA: University of 23 Ricciato, Fabio, Lanzieri, Giampaolo & Wirthmann, Albrecht. Pennsylvania. 2019. Towards a methodological framework for estimating 20 Hardinges, Jack & Wells Peter. 2018. Defining a ‘data trust’. present population density from mobile network operator data. London, UK: Open Data Institute. Luxembourg, LU: European Commission – EUROSTAT. 12 3. Four key requirements for sharing and 13 using private data for public good

Getting the politics right requires striking and fears that sharing data may negatively a balance between the private, the public influence the business prospects of selling (e.g. governments) and the individuals’ in- them in the short, medium and long term.25 terests, which implies understanding their Some companies feel they are sitting on underlying dynamics.24 gold mines –or oil fields– and they want to keep their data inside their vaults (servers) The first group of stakeholders is private until they figure out a business model. 3.1. Get the organisations, which in most legal frame- works are the legal owners or custodians of The second group of stakeholders en- politics right the data of interest. Even though corporati- compasses the institutions that demand ons increasingly monetise consumer data access to private data and/or insights Sharing private data for public good via their own operations and services, or these data contain for public good. Such involves three groups of actors: private even by selling them, they typically worry institutions could be governmental — such organisations holding the data; instituti- about sharing data externally for a variety as ministries, local governments and Na- ons expressing data demands on behalf of reasons. These include perceived repu- tional Statistical Offices (NSOs) — or they of societies; and individuals whose data is tational and legal risks (if something goes could come from academia or civil society the element of interest. These parties have wrong or if the project gets bad press), fi- (including private universities). NSOs are potentially conflicting interests, constraints nancial costs (including costs associated a good example to illustrate the main con- and priorities, some of which might be with setting up sharing infrastructures), siderations and constraints of this group. more negotiable than others. This obser- 24 Letouzé, Emmanuel, Kammourieh, Lanah & Vinck, Patrick. 25 https://www.vodafone.com/content/dam/vodcom/files/pu- vation suggests that there is a political di- 2015. The Law, Politics, and Ethics of Cell Phone Data Analy- blic-policy/Realising_the_potential_of_IoT_data_report_for_Vo- mension to the challenge of data sharing. tics. Boston, MA: Data-Pop Alliance. dafone.pdf Most NSOs around the world would want data-sharing initiatives should make it pos- to access these data for statistical purpo- sible to opt out, regardless of the usage, ses — to calculate population density or with adequate communication. This crea- poverty estimates between censuses, for tes the risk of attrition, which has been de- example. Very few, though, have been monstrably significant in situations when successful in doing so. Thus, some NSOs active consent has been sought. have proposed the use of regulations as a tool to incentivise or compel private com- A key challenge will be to convince panies to share their data with them. The data subjects that it is not only safe, but case for NSOs being able to rely on these also in their interests to agree to make their data to provide a better picture of reality data available for the purposes of public to decision-makers and citizens is strong. good – either by opting in or not opting out Nonetheless, if this is done without appro- — as we do in insurance schemes. The priate consultation and in the absence of aforementioned example of data coopera- appropriate technological and governan- tives presents a model where individuals ce safeguards, there are risks in potential- and institutions would agree to share their ly alienating private companies, breaching data for common objectives with the as- consumer privacy and breaking citizens’ sociated rights. However, given concerns trust. NSOs also have concerns about re- expressed in many countries, it seems that putational risks if it turns out that using the- significant efforts still need to be underta- se data leads to unintended consequen- ken to show evidence of returns and en- ces. One example is the recent negative sure that the technology and science are press received by a project undertaken by both sound and safe to generate public the Spanish Institute of National Statistics trust. Thus, the next requirement concerns due to their having leveraged aggregate the necessary technology and science to insights derived from mobile data without enable turning data into reliable and actio- the knowledge or explicit consent of mo- nable knowledge. bile users.26 Another fear is that of deve- loping partnerships and investing in the 14 use of private data absent guarantees that the flow of data will stop at some point in the future. Hence, the focus on regulations and partnerships to address this fear. Last, but not least, are the individuals 3.2. Get the whose data are already being analysed for many purposes, presumably with their technology and consent (we click “accept” or sign con- tracts that give companies rights over our science right data) but possibly, or typically, not with their understanding of what data is being Getting the technology and science for captured and how it is being used. Key private data sharing right means first and principles and rights, such as transparen- foremost that decisions about whether and cy, fairness, autonomy and privacy, would how to share data necessarily be informed need to be demonstrably preserved, as by state-of-the art scientific knowledge – discussed in Section 3.3. Given the value knowledge regarding not only what met- of the data, several projects have propo- hods work best on which data and for sed the creation of personal data markets which purposes, but also regarding the 27 where individuals have control over their limitations and risks associated with the own data and decide whom to share it with use of such methods. Like all technolo- and at what cost. gies, Big Data analysis can and will have both great positive and potentially negati- The GDPR and the ePrivacy Directive ve consequences. Maximising the former place a welcome premium on obtaining while minimising the latter requires having consent from users and requiring data a solid understanding of the technology controllers to put measures in place to behind the technology – the technologi- keep track of the whereabouts of consu- cal and scientific underpinnings and inner mer data. However, the GDPR also allows workings of data science. for the lawful processing and sharing of private data for some uses, including the A particularly important set of risks computation of statistics that are no longer concerns computational violations of in- considered personal data, leaving open dividual privacy stemming from possible many avenues for research and policyma- re-identification. A few years ago, “anony- king purposes without requiring consent. mising” data — meaning replacing infor- We believe that in the future, however, all mation that identified a specific individual such as their name, date of birth, phone 26 https://www.elconfidencial.com/tecnologia/2019-10-29/ ine-operadoras-recopilacion-datos-moviles-proteccion-ley- number, etc., with a unique identifier es_2304120/ (hash) — was considered to be safe, as 27 Caraviello, Michele; Lepri, Bruno; de Oliveira, Rodrigo; Oliver, long as the key or anonymisation proce- Nuria; Sebe, Nicu & Staiano, Jacopo. 2014. Money Walks: A Human-Centric Study on the Economics of Personal Mobile dure could not be found. It was also belie- Data. Seattle, WA: UbiComp 2014 - Proceedings of the 2014 ved that coarsening the data would make ACM International Joint Conference on Pervasive and Ubiqui- tous Computing. it even more difficult to re-identify people in an anonymised dataset. Research has theory, machine learning and statistical since shown that individual behaviours are estimation, all of which are of relevance so unique that it is often theoretically pos- to the data-sharing use cases for public sible to single out an individual, even with interest. coarsened data.28 This is because there can only be one individual displaying the Obviously, the larger the dataset, the specific features observed in the dataset. higher the guarantees of differential pri- If that individual is also present in another vacy, because as the number of indivi- dataset containing personal information, duals in a dataset increases, the impact and the two datasets can be matched, of any single individual on the aggregate the individual would be fully re-identified.29 statistical results Research is currently ongoing to unders- diminishes. The- tand the limits of human privacy and how it re are examples Like all technologies, can be protected.30 But the current state of of interfaces that knowledge is clear: Anonymising personal would enable ac- Big Data analysis can data is not necessarily enough to protect cess to private individual privacy.31 data while provi- and will have both ding strong pri- One promising technical approach to vacy protections great positive and preserving privacy is called „differential using differential privacy“.32 Differential privacy is a mathe- privacy, such as potentially negative matical definition of privacy that enables the Private data the statistical analysis of datasets that Sharing Interface consequences. may contain personal data in such a way (PSI) project at that when looking at the output of such Harvard.33 Maximising the former an analysis, it is impossible to determi- ne whether any specific individual’s data Another techni- while minimising the was included in the original dataset or not. cal way to mitigate The guarantee of an algorithm applied to privacy risks is to latter requires having a a dataset that is differentially private is share only aggre- 15 that its behaviour does not change when gated results and solid understanding of an individual is included in or excluded insights derived from the dataset. This guarantee holds for from analysing the technology behind any individual and any dataset. Hence, the data, such as regardless of the specific details of an in- statistical indica- the technology dividual’s data (even if such an individual tors. Doing so is is an outlier), the guarantee of differential no panacea, but it privacy should still hold. greatly reduces the risk of re-identification. Ongoing research is being conducted to One application of differential privacy is both understand future weaknesses and in the context of data sharing. The most possible technical remedies.34 studied use case is a statistical query re- lease, where external parties can perform Beyond privacy, the use of large-sca- specific counting queries (e.g: How many le human behavioural data for the public people in the dataset live in Madrid?) and good is still a nascent field. While there are receive answers that have a small amount promising results in the research literature, of random noise. Differentially private al- it is still an active research area. There are gorithms can tackle many of these queries several technical and scientific challen- such that any researcher who might re- ges we would like to highlight: a lack of ceive the approximate answers would re- ground truth,35 that would enable the pro- ach the same conclusions as if (s)he had per validation of supervised, data-driven access to the data. Beyond these simple models; difficulties with real-time access statistical queries, there are now examples to and analysis of data, despite the fact of differentially private algorithms in game that in some impactful use cases, such as in the case of natural disasters, real-time 28 Blondel, Vicent D., De Montjoye Yves-Alexandre, Hidalgo, Cesar A. & Verleysen Michel. 2013. Unique in the Crowd: The access would be imperative; the impossi- privacy bounds of human mobility. Belgium: “Communauté bility of inferring causality, instead merely française de Belgique” on Information Retrieval in Time Evol- ving Networks. highlighting correlations, with the implica- 29 Sloane, Tim. 2019. Another Great Argument for Synthetic Data tions that this may have for policy- and and Self-Sovereign Identity. United States: Payments Journal; decision-making; the potential lack of re- Kolata, Gina. 2019. Your Data Were ‘Anonymized’? These Scientists Can Still Identify You. New York, NY: The New York presentativeness of the available data, its Times. generalisation capabilities and inherent 30 Hendrickx, Julien M., de Montjoye Yves-Alexandre & Rocher, Luc. 2019. Estimating the success of re-identifications in biases; the nonexistence of certification incomplete datasets using generative models. Berlin, Ger- many: Nature. || Sloane, Tim. 2019. Another Great Argument for Synthetic Data and Self-Sovereign Identity. United States: 33 Gaboardi, Marco, Honaker, James, King, Gary, Nissim, Kobbi, Payments Journal || Kolata, Gina. 2019. Your Data Were ‘An- Ullman, Jonathan, Vadhan, Salil, and Murtagh, Jack. 2016. PSI onymized’? These Scientists Can Still Identify You. New York, (Ψ): A Private data Sharing Interface. New York, NY: Harvard. NY: The New York Times. 34 Kolata, Gina. 2019. You Got a Brain Scan at the Hospital. 31 Brogan, Caroline. 2019. Anonymising personal data ‚not Someday a Computer May Use It to Identify You. New York, enough to protect privacy,‘ shows new study. London, UK: NY: The New York Times. Imperial College London. 35 Ground truth is a term used in various fields to refer to informa- 32 Dwork, Cynthia & Roth, Aaron. 2014. The algorithmic tion provided by direct observation (i.e. empirical evidence) foundations of differential privacy. Foundations and Trends in as opposed to information provided by inference (e.g. by a Theoretical Computer Science. Philadelphia, PA: University of machine learning algorithm). In machine learning, ground truth Pennsylvania. is used to train and validate supervised models. mechanisms to guarantee the quality of Recommendations regarding the tech- the algorithms applied to the data; and the nology and the science of data sharing lack of transparency and interpretability include: (1) creating and using the best of complex machine-learning algorithms, knowledge and state-of-the-art techno- such as deep neural networks. logy, carefully curated based on expert knowledge of data science; (2) opting Moreover, using private data for public for open source as much as possible to good typically entails combining datasets enable information and knowledge sha- from different sources, which might have ring, transparency, adaptation, and the been captured at different moments in facilitation of redress when needed; (3) time with different temporal and spatial heavily investing in fundamental and ap- granularities and varying levels of noise. plied research on this area so that Europe Combining such datasets in a scientifically closes its current investment gap relative sound way is an active study focus in the to the US and China; (4) and developing area of multi-modal or multi-source data appropriate technological and governan- analysis. ce systems and standards that will allow adaptation and auditing but will not impe- If we return to our example of National de data sharing and use, nor will it kill in- Statistical Offices, they also face techni- novation and progress. As such, the next cal challenges. Some official statisticians requirement is getting the governance and remain sceptical that the data would be the ethics right. useful due to limitations in data quality, including potential biases. Many NSOs are subject to human and technological capacity constraints along with, more bro- adly, imbalances of power vis-à-vis large private companies, which limits their ab- ility to easily access private-sector data. Additional technical challenges stem from 3.3. Get the 16 the lack of the necessary technology in- frastructure and human capacity to be governance and able to systematically store, process and use the insights derived from the data. Ap- the ethics right propriate investments in human resources and infrastructure would be necessary When dealing with data about humans to successfully carry out projects in this which are then used to make decisions space, and they that impact the lives of potentially millions would need to be of people, numerous governance chal- allocated prior to lenges and ethical dilemmas emerge. When dealing with data the inception of any project. Data, In this context, a simple guiding prin- about humans which software and ciple could be to follow what humanitari- hardware are dy- ans call the “first, do no harm” principle. are then used to make namic. Therefore, Today, we have a good sense of the risks all projects need that Big Data and Artificial Intelligen- decisions that impact teams of experts ce pose to privacy, equity, fairness and dedicated to them transparency to name a few areas—even the lives of potentially on a continuous when a project starts with the best of in- basis. tentions. How can we be sure that sharing millions of people, and using private data will do no harm? The choice of Will automated, data-driven decisions be numerous governance data-sharing mo- outside of our control? Who is accoun- del – of the five table for such decisions in cases where challenges and ethical models described they may be the result of analysing se- in Section 2 – will veral datasets by complex software and dilemmas emerge. determine what social systems developed by different technical solution parties? Will these systems offer the ne- might be the most cessary security mechanisms to prevent appropriate. Ba- cyberattacks? What about the malicious sed on our experience, we believe that the use of the data to serve the interests of fourth model (using APIs and Open Algo- non-democratic governments or orga- rithms) has many of the features required nised crime? The truth is that it is very to tackle these daunting technical chal- hard to provide an unequivocal answer lenges. This belief stems primarily from to these questions. One significant chal- the fact that data does not need to leave lenge that our knowledge of such novel the owner’s servers, with the user instead technologies is limited and will likely soon being provided with aggregates or indi- be outdated -- or turn out to be just plain cators. Additionally, this model’s reliance wrong. on APIs and Open Algorithms to receive data means that queries can be made in The ethical principles and standards of a variety of systems, regardless of internal governance for data-driven initiatives for architectures. public good need to be clearly defined and meticulously complied with. In the having clarity with respect to the attribu- past decade, numerous ethical guidelines tion of responsibility for the consequences and principles in the context of data-dri- of algorithmic decisions. Finally, A stands ven decision-making for public good, and for the augmentation – rather than substi- more generally on the use of Artificial Intel- tution – of human intelligence, such that ligence, have been proposed. To name a these types of systems are used to sup- few: the principles of the Menlo Report;36 port human decision-making and not to the report by the European Commission replace humans altogether. We share the for the development of Trustworthy Artifici- view of a “Human AI” model, as previously al Intelligence;37 the OECD38 principles for described. the development of Artificial Intelligence; the ethical principles included in the natio- T for trust and transparency. Trust nal Artificial Intelligence strategies of over is a fundamental pillar in our relation- 20 countries in the world; and the ethics in ships with other humans or instituti- Artificial Intelligence initiatives within pro- 3ons, but it is a relative concept rather than fessional organizations, such as the IEEE39 an absolute one. We typically trust someo- and the ACM.40 Next, we group most of the ne for a specific purpose. We might trust previously proposed principles using the an institution or an individual to take custo- FATEN41 acronym, which is an extension dy of our money, but we might not neces- of the four basic principles of medical et- sarily trust the same institution or individu- hics.42 al to take care of our children, for example. Trust emerges when three conditions are F for fairness, i.e. free of discrimi- met: (1) competence, i.e. the ability to car- nation. Data-driven algorithmic deci- ry out the committed task; (2) reliability, i.e. sions might discriminate if the data sustained competence over time; and (3) 1used to train such algorithms is biased honesty and transparency. Hence, the T in or if the choice of algorithm or model for FATEN is also for transparency. the problem at hand was poor. In the past year, several highly impactful cases A data-driven decision-making sys- of algorithmic discrimination have been tem is transparent when non-experts are 17 brought to light in public good areas, such able to observe it and easily understand as criminal justice43 and healthcare.44 it. Data-driven decision-making systems might not be transparent for at least three A for autonomy, accountability and reasons:46 (1) intentionally, to protect the intelligence augmentation. Autono- intellectual property of the system‘s crea- my is a central value in Western et- tors; (2) due to the digital illiteracy of their 2hics, according to which everyone should users, which prevents them from unders- have the ability to have sovereignty over tanding how the models work; and (3) their own thoughts and actions, thus ensu- intrinsically, given that certain statistical, ring freedom of choice, thought and beha- machine learning approaches – such as viour. These days, though, it is possible to deep learning – are extremely complex build computational models of our desires, and difficult to interpret. needs, tastes, interests, personalities and behaviour that could be – and frequently Moreover, we need to ensure transpa- are – used to subliminally influence our rency not only regarding which data is decisions and behaviours, as was the being captured, shared and analysed and case in recent electoral processes in the for what purposes, but also in which situa- US and the UK.45 Thus, we should ensure tions humans are interacting with artificial that data-driven decision-making systems systems as opposed to with other humans. always respect human autonomy and dig- nity. A also stands for accountability, i.e. E for bEneficence and equality. The principle of bEneficence refers 36 Dittrich, D. and Kenneally, E. 2012. The Menlo Report: Ethical to having the intent of doing good Principles Guiding Information and Communication Techno- and maximising the positive impact in the logy Research. United States: The Department of Homeland 4 Security use of data-driven decision-making algo- 37 European Commission. 2019. Ethics guidelines for trustworthy rithms with sustainability, diversity and AI. European Union. veracity. We cannot obviate the environ- 38 OECD. 2019. OECD Principles on AI. Paris, France: OECD. 39 The Institute of Electrical and Electronics Engineers. 2017. mental cost of technological development, Ethically Aligned Design. Piscataway, NJ: IEEE. particularly when it comes to Artificial In- 40 Association for Computing Machinery. 2018. Code of Ethics telligence algorithms, given their need for and Professional Conduct. New York, NY: ACM. large amounts of data to learn from and 41 Oliver, N. 2019. Governance in the Era of Data-driven Deci- sion-making Algorithms. London, UK: Women shaping Global massive amounts of computation needed Economic Governance-Center for Economic Policy Research Press. to process and be trained by such data. A 47 42 Gillon, R. 1994. Medical ethics: four principles plus attention to recent study found that the carbon foot- scope. UK: British Medical Journal. print of training just one state-of-the-art 43 Angwin, Julia, Larson, Jeff, Mattu, Surya and Kirchner, Lauren. deep-learning model to perform natural 2016. Machine Bias: There’s software used across the country to predict future criminals. And it’s biased against blacks. language processing tasks was similar to Manhattan, NY: Pro Publica 44 Ledford, Heidi. 2019. Millions of black people affected by 46 Burrell, Jenna. 2016. How the machine ‘thinks’: Understanding racial bias in health-care algorithms. Berlin, Germany: Nature. opacity in machine learning algorithms. Newbury Park, CA: 45 Gorodnichenko, Y., Pham, T., Talavera, O., 2018. Social SAGE Journals. media, sentiment and public opinions: evidence from #Brexit 47 Strubell, E., Ganesh, A., McCallum, A. 2019. Energy and and #USelection. Swansea, UK: Swansea University, School of policy considerations for deep learning in NLP. Florence, Italy: Management Association for Computational Linguistics the amount of carbon dioxide that the ave- (Amazon). This market dominance leads rage American emits in two years. to data dominance. In fact, most of these technology companies are data compa- Diversity is also of paramount importan- nies that earn billions of dollars by analy- ce, from two perspectives. First, by ensu- sing and leveraging our data. Furthermo- ring that the teams developing data-driven re, a large portion of the valuable human algorithms for public good are diverse, behavioural data that could be used for which is not the case today. Diversity is public good is generated and captured by needed to maxi- the services that these technology com- mise the chances panies offer to their customers – services that projects will that increasingly cover all aspects of our Following the First be inclusive and lives, including our entertainment, sports, relevant in the work, health, education, transportation Industrial Revolution, communities whe- and travel, social connections and com- re they will be de- munication, commerce and information, wealth meant owning ployed. Second, and product needs. by incorporating capital assets, such as diversity criteria In addition, a significant characteristic into the algorithms of the 21st century is the polarisation of factories and machines. we design, we wealth distribution. According to the latest can minimise the Global Wealth Report by Credit Suisse,49 Today, one could existence of filter the 100 richest people in the world are bubbles and echo richer than the poorest 4 billion. This ac- argue that data – and chamber effects,48 cumulation of wealth in the hands of very which have been few has been at least partially attributed more importantly, the – at least parti- to technology and the so-called Fourth In- ally – blamed for dustrial Revolution. With the agrarian revo- ability to make sense the polarisation of lution in the Neolithic and for thousands of public opinion we years afterwards, ownership of land led to 18 of it – is the asset that see today. wealth. Following the First Industrial Revo- lution, wealth meant owning capital assets, generates the most Moreover, we such as factories and machines. Today, need to invest in one could argue that data – and more im- wealth, leading to the efforts to ensure portantly, the ability to make sense of it – is the veracity of the the asset that generates the most wealth, data economy. data that is and leading to the data economy. If we want will be used for to maximise the positive impact of this ab- the public good. undance of data, we should promote new Today, we can al- models of data ownership, management, gorithmically generate fake text, photos, exploitation and regulation. Data sharing videos and audio using deep neural net- used for the public good could contribu- works (deep fakes) that are indistinguisha- te both to increasing equality in the world ble from real content. If we are using data and to better measuring existing inequa- to inform decisions that impact the lives of lities. An interesting data sharing model millions of people, we need to ensure that from this perspective is the concept of such data is indeed truthful and a reflecti- data cooperatives, in which participants on of the underlying reality that the models from different sectors (frequently compa- are attempting to model. nies) share their data to create public va- lue, as described in Section 2. E also stands for equality. The de- velopment and wide adoption of the Inter- N for non-maleficence, i.e. minimi- net and the World Wide Web during the sing the negative impact that could Third and Fourth Industrial Revolutions be derived from the use of the data. has undoubtedly been key to democrati- 5Within the non-maleficence principle, we sing the access to information. However, highlight the use of prudence and the need the original principles of universal access to (1) maximise data security, (2) provide to knowledge and the democratisation reliability and reproducibility guarantees of technology are in danger today due and (3) always preserve people’s privacy, to the extreme dominance of technology as explained in Section 3.2. giants in the US (Apple, Amazon, Micro- soft, Facebook and Alphabet/Google) and Once defined and agreed, the ethical China (Tencent, Alibaba, Baidu), which principles of data sharing will need to be has led to a phenomenon known as “win- published, implemented and fostered in ner takes all”. Together, these technology practice through appropriate standards of companies have a market value of more governance. The roles and responsibilities than 5 trillion USD and a US market share of each of the three actors in data sharing of more than 90% in Internet searches (companies, public institutions and people) (Google), more than 70% in social networ- need to be clearly stated and accepted. king (Facebook) and 50% in e-commerce 49 Shorrocks, Anthony & Hechler-Fayd’herbe, Nannette. 2019. 48 Sadagopan, Swathi Meenakshi. 2019. Feedback loops and Global Wealth Report 2019: Global wealth rises by 2.6% driven echo chambers: How algorithms amplify viewpoints. United by US & China, despite trade tensions. Zurich, Switzerland: States: The Conversation. Credit Suisse Given the multi-disciplinary nature of them. That’s why the fourth requirement data-driven projects for public good, a has to do with getting the economics right. combination of different disciplines is re- quired, including from the social sciences and humanities. This multi-disciplinary nature adds a level of complexity regar- ding the governance of the projects but is potentially beneficial when it comes to the definition of, and compliance with, ethical principles since there might already be et- 3.4. Last but not hicists within the teams. least – Get the Moreover, external oversight bodies might be necessary to ensure that ethical economics right principles are complied with. For this pur- pose, the idea of data stewards50 has been Most initiatives involving the sharing proposed in recent years. Data stewards of private data for the public good have are individuals or groups of individuals thus far been self-funded by companies within an organisation who are responsi- and have largely consisted of providing ble for the quality and governance of data data and/or statistical indicators derived in data-driven projects that take place in from such data for free. Questions inevi- their organisations, including data sharing tably arise, however, about the financial initiatives and/or data for social good ini- sustainability, commercial positioning tiatives. Alternative proposals include the and business models of these approa- appointment of chief ethics officers or the ches—especially in relation to “for profit” creation of an external oversight ethics solutions. board – such as OPAL’s CODE (described in Section 4) — which could be in charge Indeed, several companies that have of auditing data for social good projects been at the forefront of the “data for so- to ensure that they are aligned with the cial good” movement over the past deca- 19 pre-defined ethical principles and human de, particularly telecommunication ope- values of the societies where they are de- rators like Telefonica and Orange, have veloped. also invested heavily to develop their own commercial offerings as part of their data Another way to ensure compliance with monetisation strategies. These commer- the ethical principles agreed upon is by cial solutions, e.g. Telefonica’s SmartS- using open processes and systems as teps, Vodafone’s Location Insights and much as possible, and to foster knowled- Orange’s FluxVision, provide users and ge transfer. It also calls for incentivising clients pre-computed indicators derived active partnerships between academia, from aggregate customer data, such as civil society organizations and all those population density and mobility estima- who are part of the same data ecosys- tions. For example, a city government tem. As explained in Section 2, data trusts or a commercial group will get in touch might also be used for data governance. with these entities with a need for some indicator(s). A commercial and technical In addition, understanding the cultural discussion will ensue to determine the and social characteristics of the societies characteristics and cost of the dataset(s) where the projects are deployed is a must. or indicators that can be provided. Even- Hence, the importance of working with lo- tually, a contract is signed that stipulates cal institutions and the civil society of the a price along with payment and technical countries where the projects take place. modalities such as the project timeline. In some cases, the price might be zero— In sum, any use of data for the public when the project is for a humanitarian good by governments or not-for-profit or- cause (e.g. following an earthquake), for ganisations should be fully transparent, example. open and accountable. The results of such use must be auditable regarding its In recent years, technology compa- purpose, fairness and accuracy, particu- nies have joined the movement of lever- larly given the fact that the definition of pu- aging their data for purposes related to blic good is very broad. the social good, including Facebook51 and Google52. In developed countries, Even when the politics are tackled, all the granularity, volume and richness of the necessary technology to access and human behavioural data accumulated by analyse the data is in place and clear ethi- technology companies is undisputed. cal principles have been defined and are being complied with, data sharing pro- These companies, however, may fear jects for public good will fail if they do not and resist the development of different “for have a sustainable financial model behind good” solutions because they could be viewed as internal commercial threats that 50 Verhulst, Steefaan G. 2018. The Three Goals and Five Functions of Data Stewards: Data Stewards: a new Role and Responsibility for an AI and Data Age. New York: NY: Medium 51 https://dataforgood.fb.com/ and The Data Stewards Network. 52 https://cloud.google.com/data-solutions-for-change/ 20

may cannibalise their existing data-dri- ce suggests that they are expensive and ven services. Their main rational is that usually cost far more than a research uni- the existing commercial offerings already versity, NSO or NGO can afford. Further- meet some of the “for good demands”— more, individuals and groups represented understood as providing indicators for free in the raw datasets used may have limited in some cases. But the argument fails to involvement and information about their understand or recognise the hurdles and inclusion or exclusion in such datasets, requirements on the path from private data particularly when the data is aggregated to public good, as well as shifts in public and ceases to be considered personal opinions and regulations. First, the under- data. Third, access is difficult, limited to lying algorithms are closed, proprietary those who have the means to purchase and generally black boxes. Indeed, while these products and to devote the effort calibration may be made using the latest needed to negotiate the terms of their official demographic projections, there usage. Such systems might not be appro- aren’t clear guarantees regarding their priate in a fast-paced world that strives for scientific rigor or their compliance with near-frictionless transactions nor for smal- strict regulations when the data is meant ler providers, particularly NGOs, with less to be used for public good purposes. resources. Finally, the range of statistical indicators is limited to what exists in these Black box systems, for example, are product offerings. currently unfit for use by National Statisti- cal Offices, which, according to the Fun- These limitations, and the vision of the damental Principles of Official Statistics, business units behind these data-driven need to be able to itemise how the statis- commercial solutions, need to be well un- tics they provide were derived.53 This is an derstood to offer complementary solutions. important point: In their current form, these The key is to find a value proposition for solutions will not allow official statistics to public-good projects that is complementa- use new data sources. Second, although ry to, and not at odds with, these proprie- prices are not public, anecdotal eviden- tary commercial products and which still allows for a viable commercial model that 53 United Nations Statistics Division. 2014. Fundamental Princip- les of Official Statistics (A/RES/68/261 from 29 January 2014). then supports the investment needed by Geneva, Switzerland: United Nations. businesses to create useful insights and 21

design bespoke solutions for the user. tors would be provided for free for some That may mean serving different stakehol- high-priority, high-impact, urgent usages ders and market segments, providing ad- (e.g. natural disasters or humanitarian cri- ditional indicators (such as SDG-type and/ ses), while others – perhaps more granular or less granular indicators), offering addi- — would be fee-based, with a fee structure tional services (such as support for use in that could be adjusted according to users specific cases) or serving purposes and and usages. This model is currently being objectives that are not monetary in nature. investigated by a few telecom operators such as Sonatel. The public sector has a Moreover, as we have discussed pre- role to play in fostering the development viously in this paper, many public-good and adoption of these models through a scenarios require open, transparent and combination of fiscal incentives, use-case explainable algorithms that can be subject development and education along with the to auditing and public scrutiny regarding designing of helpful legislative and regula- their reliability, fairness, accuracy and re- tory frameworks. producibility. Commercial products might not have to satisfy these conditions. A last question is whether people should be able to sell their own data on Critically, making a convincing case for a data market. The cases for and against such complementary initiatives requires such a model can be and have been argu- explaining why and how these cannot ad- ed convincingly.54 dress all the needs and gaps that impede access and use of private data for public These considerations and possibilities good. It also requires arguing how these are currently being reflected and tested in complementary solutions would address the OPAL project, described below. unmet needs, e.g. those of National Sta- tistical Offices. In all cases, it means de- signing and implementing commercial po- sitioning and business models to ensure the financial viability of the projects. One 54 Tonetti, Christopher & Kerry, Cameron F., 2019, Should con- promising option is to develop a freemium sumers be able to sell their own personal data? New York, NY, model, whereby some statistical indica- The Wall Street Journal 4. An example of the way

forward:22 OPAL

PAL is an example of the way for- OPAL aims to provide a blueprint and under the physical and custodial control ward to realise the vision of a fai- toolkit for addressing the main obstacles of the data holder and are never shared, rer, healthier, more efficient and outlined in this paper. “Data Challenges” transferred or exposed to third parties. Omore inclusive society thanks to data-dri- and bilateral agreements have demon- This federated approach can significant- ven decision-making processes. OPAL strated the value of data for the public ly reduce theft and misuse — as well as is short for Open Algorithms and it is a sector, but have also shown how costly, “missed use”. platform –in terms of both technology and complex and risky (especially those that governance systems and standards – to involve irreversibly sharing “anonymised” OPAL seeks to foster the inclusion and leverage the use of data collected and data that might be re-identified) these ar- inputs of data subjects on the priorities and controlled by private companies for pu- rangements can be. Rising concerns over uses of analysis performed on data about blic-good purposes in an ethical, scala- privacy, fears of growing digital divides, them through participatory processes. ble and sustainable manner. The main and criticisms about black box algorith- Moreover, OPAL includes an oversight design principle consists of pushing the mic decision-making constrain innovation, body called the CODE (Council for the computation to the data, rather than ever investment and progress from the private Orientation of Development and Ethics). exposing or sharing that data. sector. Other obstacles such as capacity Sustainability is sought by strengthening constraints and political calculus are in- local capacities and connections and by OPAL falls within the question-and-ans- hibiting meaningful progress on making establishing viable business models. Af- wer data sharing approach previously data truly matter for development. ter an initial, overall successful, two-year described in the one-to-many or few-to- “proof of concept” phase with pilots in Se- many models and it builds on the years OPAL strives to change that by ena- negal and Col0mbia with telecom partners of work performed by its group of foun- bling multisector, open-source public-pri- and their NSOs, OPAL is entering a “proof ders, including several of the individuals vate federated data systems to create of market“ or beta phase in both countries. behind the D4D and Data For Refugees social value without endangering privacy The goal is to further develop and test key (D4R) challenges. The authors of this and corporate value, developing appro- features and functionalities that should set paper are also actively involved in OPAL: priate systems and standards in terms of OPAL on a path towards future global ex- Dr. Letouzé is a co-founder and Execu- both technology and governance. pansion to other telecom partners, indus- tive Director of the OPAL project whereas tries, and geographies, with sustainable Dr. Oliver has collaborated since its in- A defining feature of OPAL is its federa- revenues. ception through her role at the Data-Pop ted architecture and its question-and-ans- Alliance. wer approach, where computation is “pus- OPAL is designed to be an example of hed out” to the data. The data remains how data can be at the heart of a fairer and Figure 3: The OPAL Model

1 Partner private companies (here telecom operator) host its data and a selection of 4 open algorithms created by deve- certified algorithms on its own lopers are Certified and made avai- local OPAL secured platform, lable publicly to be selected to run behind their firewall on the servers of partner private companies OPAL

Algo evelopers

User 3 Key indicators derived from algo- rithms on private sector data such as population density, poverty le- vels, or mobility patterns, feed into use cases in various public policy and economic domains 23 2 A governance process including a Coun- cil for the Orientations of Development and Ethics (CODE) ensures that the algorithms and use cases are ethically sound, context relevant, outcome driven, ... users benefit from capacity building activities

more evidence-informed approach to so- We believe that OPAL is one of the most cietal development around the globe, in the promising current initiatives that can rea- context and support of the Sustainable De- listically provide an integrated socio-tech- velopment Goals (SDGs). To do so, OPAL nical governance approach (i.e. a coher- aims to develop and deploy much-needed ent set of technological and institutional technological systems and governance systems and standards) to deliver on the standards addressing constraints on ac- promise of using sensitive personal data cess and analysis of sensitive personal safely (from a privacy standpoint), ethical- data collected and controlled by private or- ly (especially from a local participation and ganizations, allowing communities to weigh oversight standpoint), and in a scalable in and benefit from use cases. manner (especially with its open-source approach, emphasis on capacity building, OPAL’s value proposition is captured and partnerships with 2 major telecom by the acronym USAGE: operators) in the foreseeable future.

User-centric use cases Algorithms: fairer, con- Ecosystem: stronger and – to show value and stantly updated, audi- healthier connections between yield and share lessons table, co-developed the stakeholders, built on trust

Sales of data-driven Governance: safe data access, services for public good, enabling more predictable USAGEunder a freemium model participatory governance 5. Conclusions and Recommen-

24dations

n conclusion, we would like to offer a ethical principles, potential legal and prove the state of the world by upholding synthesis and some suggestions and regulatory barriers, technical limitations rather than eroding development and de- recommendations. First, we must recog- and the existence of competing commer- mocratic principles and processes. Inise that the issues under consideration cial interests. Looking ahead, it is clear cannot all be appropriately considered in that one of the biggest questions facing all In addition, if we are interested in the a policy paper. They are so complex and, stakeholders, particularly policymakers, is end goal—improving the world through in part, contentious that they will only be whether and how these data – or rather the the use of private data—we should not addressed over time through processes insights they contain – should be shared worry only about cracking the major and cycles of interactions and negotiati- to maximise their potential for public-good first-mile problem of data access. We must ons between stakeholders – and will thus purposes. instead consider the entire spectrum of be viewed as partial and, likely, unsatis- hurdles and pathways that would hinder or factory by one or more of them. It must Next, we join other “cautious optimists” facilitate our data crumbs to improve our also be noted that some major events – in arguing that legitimate concerns ab- communities. In short, to make data mat- such as another data scandal or a major out risks must inform rather than impede ter. We argue that this requires supporting discovery—may fundamentally impact this attempts at making data a key force for the deployment of sustainable systems area, on way or the other. Nevertheless, human progress. We believe that these and standards, but also the development we believe that there are sound general new kinds of crumbs and capacities have of capacities and incentives, to access, objectives, principles and guidelines that the potential to help reduce pervasive li- analyse and put data to use. This can only should inform the ongoing and forthco- mitations in human decision-making and be done if we take an (eco)system per- ming interactions and negotiations that will governance, including corruption, cogniti- spective that does not consider data as determine how private data will be shared ve biases, misinformation, greed and the the starting point, but which looks at how and used in the future – and that Europe conflict of interests to build healthier hu- data are generated, controlled and used. can and should play a leading role. man communities. The „Three Cs“ framework only serves to point to systemic parts and the dynamics In summary, let us take a look at some We furthermore believe that Europe, of these questions. of our convictions we have outlined in this because of its values and assets, can lead paper. the way in addressing those concerns, in- It is essential to recognise and reflect cluding by building on and promoting the the fact that data access and sharing are First, a decade into the (Big) Data Re- vision behind the GDPR. We believe that only the beginning of the causal chain. volution, major hurdles remain, including data- or evidence-informed decision-sup- Data usage and impacts must also be difficulties associated with providing third port systems, where the principles and centrally considered. Consequently, di- parties access to these data without crea- tools of machine-learning algorithms might scussions of data sharing for public good ting risks to individual or group privacy. play a central role (referred to by some should not take place in a vacuum. They In addition, there is lack of well-defined authors as “Human AI” systems) can im- must take into account data-for-profit acti- vities that are part of companies’ efforts at which incentivise private actors to conti- monetising the data they collect and con- nue to invest in algorithms and analytics trol. Whether and how privately collected that can produce useful insights. Over- and controlled data could or should be all, we argue for a systemic approach to shared for the purposes of public good thinking and doing and we advocate for is not merely a technical or even a legal ambitious yet realistic and achievable ap- matter, but a complex multi-faceted one proaches to data sharing, of which OPAL that also gets deep into political, social, is, in our eyes, a promising example. ethical and business realms. Data will determine to a large extent the future of We formulate eight recommenda- democracy and human progress. Thus, it tions to contribute to the vision of a is and will continue to be part of the struc- more just, more participatory and he- ture of modern societies, a key partner to althier society, one which consistent- us humans, supporting our decision ma- ly places people and their interests at king. its core:

We believe these discussions must fo- Rather than sharing the raw data, cus on four main types of considerations: provide secure access to actiona- (1) political; (2) technological and scien- ble insights derived from the data. tific; (3) ethical and legal and (4) financial 1 and commercial, within which we have Develop open, participatory sys- identified major points and recommenda- tems and standards to enable data tions. sharing across all companies and 2sectors with inputs and oversights from Politically, it is first and foremost ab- users, such as OPAL’s code and CODE. out considering and connecting the in- centives of all stakeholders in ways that Invest ambitiously in education, balance out the long and short terms. The capacity building, research, in- variety of actors (companies, public ad- centives and outreach to obtain ministrations and citizens) with their po- 3the support and contributions from all 25 tentially conflicting agendas and needs actors private and public (including ci- must be acknowledged and actively in- tizens) in Europe and beyond. cluded. Implement incentives, remove re- In terms of technology and science, gulatory barriers and define ena- we argue that in most cases, it is suffi- bling regulations with the aim of cient to share aggregated insights, rather 4accelerating the use of data for social than raw or pre-processed data, through good following ethical principles, such technological systems that are safe and as the FATEN principles. scalable as opposed to being ad hoc. We also believe that much stronger relations Promote corporate governance must be built with and within academia and engagement models—inclu- so as to benefit from the latest advances. ding the appointment of data ste- Investments are essential to tackle im- 5wards, chief ethics officers and over- portant technical challenges, such as the sight boards—that will make this area lack of “ground truth” data (typically offi- an element of standard practice in both cial statistics or survey data) and improve public administrations and private com- access to real-time data; the difficulties panies. in inferring causal relationships; the com- plexity of combining data from multiple Establish business-to-government sources; the potential representational data sharing groups in all countries shortcomings of the available data, its and regions (see the example of generalisation capabilities and inherent 6Colombia or the B2G High-Level Expert biases; the non-existence of certification Group at the European Commission). mechanisms to guarantee the quality of the algorithms applied to the data; and Support local and regional centres the lack of transparency and interpreta- of excellence that leverage private bility of complex machine-learning algo- data for public good in several key rithms, such as deep neural networks. 7cities in Europe. Ethically and legally, we need to agree Provide necessary funding to ad- on and enact key ethical principles, such dress the challenges mentioned as the FATEN principles, with a premium above and enable a sustainable not just on privacy as safety, but on priva- 8model of data sharing for public good. cy as agency, through participation and information, thus keeping humans and We truly believe in the power of data societies in the loop. to enable us to transition to evidence-in- formed societies, for and by the people. Last, we believe that using data at sca- Because sharing really is caring. le to change human systems profoundly will only be achieved if sustainable, ba- lanced business models are in place Works Cited

26

Angwin, Julia, Larson, Jeff, Mattu, Su- Brockman, John, 2012. Reinventing So- rya and Kirchner, Lauren. 2016. Machine ciety in the Wake of Big Data. A conver- Bias: There’s software used across the sation with Alex “Sandy” Pentland. United country to predict future criminals. And it’s States: The Edge. biased against blacks. Manhattan, NY: ProPublica. Brogan, Caroline. 2019. Anonymizing personal data ‚not enough to protect priva- Association for Computing Machinery. cy,‘ shows new study. London, UK: Impe- 2018. Code of Ethics and Professional rial College London. Conduct. New York, NY: ACM. Burrell, Jenna. 2016. How the machine Blondel, Vincent D., Decuyper, Adeline ‘thinks’: Understanding opacity in machine & Krings Gautier. 2015. A survey of results learning algorithms. Newbury Park, CA: on mobile phone datasets analysis. Belgi- SAGE Journals. um: EPJ Data Sci. Caraviello, Michele; Lepri, Bruno; de Blondel, Vicent D., De Montjoye Oliveira, Rodrigo; Oliver, Nuria; Sebe, Nicu Yves-Alexandre, Hidalgo, Cesar A. & Ver- & Staiano, Jacopo. 2014. Money Walks: leysen Michel. 2013. Unique in the Crowd: A Human-Centric Study on the Economics The privacy bounds of human mobility. of Personal Mobile Data. Seattle, WA: Belgium: “Communauté française de Bel- UbiComp 2014 - Proceedings of the 2014 gique” on Information Retrieval in Time ACM International Joint Conference on Evolving Networks. Pervasive and Ubiquitous Computing. Dwork, Cynthia and Roth, Aaron. 2014. Letouzé, Emmanuel. 2012. Big Data for The Economist’s Editor. 2017. The The algorithmic foundations of differential Development: Challenges & Opportunities. world’s most valuable resource is no lon- privacy. Philadelphia, PA: University of New York, NY: United Nations Publicati- ger oil, but data. United States: The Eco- Pennsylvania. Foundations and Trends in ons. nomist. Theoretical Computer Science. Letouzé, Emmanuel, Kammourieh, La- The Institute of Electrical and Electro- Dittrich, D. and Kenneally, E. 2012. The nah & Vinck, Patrick. 2015. The Law, Poli- nics Engineers. 2017. Ethically Aligned Menlo Report: Ethical Principles Guiding tics, and Ethics of Cell Phone Data Analy- Design. Piscataway, NJ: IEEE. Information and Communication Techno- tics. Boston, MA: Data-Pop Alliance. logy Research. United States: The Depart- Tonetti, Christopher & Kerry, Cameron ment of Homeland Security. Letouzé, Emmanuel & Pentland, Alex. F. 2019. Should Consumers Be Able to Sell 2018. Towards a Human Artificial Intelli- Their Own Personal Data? New York, NY: European Commission. 2018. Com- gence for Human Development. ITU Jour- The Wall Street Journal. mission appoints Expert Group on Busi- nal: ICT Discoveries. ness-to-Government Data Sharing. Euro- U.N. Editor. 2018. World Data Forum pean Union. Mayer-Schönberger, Viktor & Cukier, wraps up with a declaration to boost finan- Kenneth .2013. Big Data: A Revolution cing for data and statistics. New York, NY: European Commission. 2019. Ethics That Will Transform How We Live, Work United Nations Department of Economic guidelines for trustworthy AI. European and Think. London, UK: John Murray Pu- and Social Affairs. Union. blishers. ISBN 9781848547926. United Nations Statistics Division. 2014. Gaboardi, Marco, Honaker, James, OECD. 2019. OECD Principles on AI. Fundamental Principles of Official Statis- King, Gary, Nissim, Kobbi, Ullman, Jona- Paris, France: OECD. tics (A/RES/68/261 from 29 January 2014). than, Vadhan, Salil, and Murtagh, Jack. Geneva, Switzerland: United Nations. 2016. PSI (Ψ): a Private data Sharing In- Oliver, Nuria. 2019. Governance in the terface. New York, NY: Harvard. Era of Data-driven Decision-making Algo- Verhulst, Steefaan G. 2018. The Three rithms. London, UK:Women shaping Glo- Goals and Five Functions of Data Ste- García Martínez, Antonio. 2019. No, bal Economic Governance-Center for Eco- wards: Data Stewards: a new Role and Re- Data Is Not the New Oil. San Francisco, nomic Policy Research Press. sponsibility for an AI and Data Age. New CA: Wired York: NY: Medium and The Data Stewards27 O’Neil, C. 2016. Weapons of math de- Network. Gillon, R. 1994. Medical ethics: four struction: how big data increases inequa- principles plus attention to scope. UK: Bri- lity and threatens democracy, New York, Walsh, Dylan. 2019. How credit unions tish Medical Journal. NY: Crown, and Y. N. Harari. 2018. Why could help people make the most of per- Technology Favors Tyranny. Washington, sonal data. Boston, MA: MIT Management Gorodnichenko, Y., Pham, T., Talavera, DC: The Atlantic. Sloan School. O., 2018. Social media, sentiment and pu- blic opinions: evidence from #Brexit and Panel Secretariat led by Dr. Homi Kha- World Bank Team led by Mishra, Dee- #USelection. Swansea, UK: Swansea Uni- ras. 2013. A New Global Partnership: pak & Deichmann, Uwe. 2016. World De- versity, School of Management. Eradicate poverty and transform econo- velopment Report 2016. Digital Dividends mies through sustainable development. Overview. Washington, DC: International Hardinges, Jack & Wells Peter. 2018. High-Level Panel of Eminent Persons on Bank for Reconstruction and Development Defining a ‘data trust’. London, UK: Open the Post-2015 Development Agenda. New / The World Bank. Data Institute. York, NY: United Nations Publications. World Economic Forum. 2018. Our Hendrickx, Julien M., de Montjoye Ricciato, Fabio, Lanzieri, Giampaolo shared digital future: building an inclusive, Yves-Alexandre & Rocher, Luc. 2019. Es- & Wirthmann, Albrecht. 2019. Towards a trustworthy and sustainable digital society. timating the success of re-identifications in methodological framework for estimating Cologny/Geneva, Switzerland: World Eco- incomplete datasets using generative mo- present population density from mobile nomic Forum. dels. Berlin, Germany: Nature. network operator data. Luxembourg, LU: European Commission – EUROSTAT. IDC. DOMO INC. 2018. Data Never Sleeps 6.0. American Fork, UT: DOMO. Sadagopan, Swathi Meenakshi. 2019. Feedback loops and echo chambers: Kolata, Gina. 2019. You Got a Brain How algorithms amplify viewpoints. United Scan at the Hospital. Someday a Compu- States: The Conversation. ter May Use It to Identify You. New York, NY: The New York Times. Shorrocks, Anthony & Hechler-Fayd’her- be, Nannette. 2019. Global Wealth Report Kolata, Gina. 2019. Your Data Were 2019: Global wealth rises by 2.6% driven ‘Anonymized’? These Scientists Can Still by US & China, despite trade tensions. Zu- Identify You. New York, NY: The New York rich, Switzerland: Credit Suisse. Times. Sloane, Tim. 2019. Another Great Ar- Kolbert, Elizabet. 2017. Why Facts gument for Synthetic Data and Self-Sove- Don’t Change Our Minds. New discoveries reign Identity. United States: Payments about the human mind show the limitations Journal. of reason. New York, NY: The New Yorker. Strubell, E., Ganesh, A., McCallum, A. Ledford, Heidi. 2019. Millions of black 2019. Energy and policy considerations for people affected by racial bias in health- deep learning in NLP. Florence, Italy: Asso- care algorithms. Berlin, Germany: Nature. ciation for Computational Linguistics. Authors

28

Emmanuel Letouzé, PhD Nuria Oliver, PhD Emmanuel Letouzé is the Director and Nuria Oliver is chief data scientist at Da- co-Founder of Data-Pop Alliance, a coaliti- ta-Pop Alliance, chief scientific advisor at on on Big Data and development co-crea- Vodafone Institute and a Fellow at the Spa- ted in 2013 by the Harvard Humanitarian nish Royal Academy of Engineering. She Initiative, MIT Media Lab, Overseas De- has 20 years of research experience in velopment Institute, joined in 2016 by the human behaviour modelling, human-com- Flowminder Foundation as its 4th core puter interaction and using Big Data for member. He is a Visiting Scholar at MIT social good. Prior to joining Vodafone, Media Lab, a Research Affiliate at HHI she was Scientific Director at Telefonica and a Research Associate at ODI. He R&D, the first woman to hold that position. is the author of UN Global Pulse’s White Before that, she spent over seven years at Paper “Big Data for Development” (2012) . She holds a PhD in and of the 2013 and 2014 OECD Fragiles perceptual intelligence from MIT and was States reports. His research and work fo- named European Digital Woman of the cus on Big Data’s application and impli- Year in 2016 and Engineer of the Year in cations for official statistics, poverty and Spain in 2018. She holds an honorary PhD inequality, conflict, crime, and fragility, from the University Miguel Hernandez. climate change, vulnerability and resilien- ce, and human rights, ethics, and politics. Her work is well known with over 150 scienti- He worked as a Development Economist fic publications that have received more than for UNDP in New York from 2006-09 on 12000 citations and a ten best paper award fiscal policy, post-conflict economic reco- nominations and awards. She is co-inventor very and migration, and between 2000-04 of 40 filed patents and she is a regular key- in Hanoi, Vietnam, for the French Minis- note speaker at international conferences. try of Finance as a technical assistant in Nuria’s work and professional trajectory has public finance and official statistics. He received several awards, including the MIT holds a BA in Political Science and an MA TR100 (today TR35) Young Innovator Award in Economic Demography from Sciences (2004), the Rising Talent award by the Wo- Po Paris, an MA from Columbia University men’s Forum for the Economy and Society School of International and Public Affairs, (2009) and the European Digital Woman of where he was a Fulbright Fellow, and a the Year award (2016). She has been named PhD from the University of California, Ber- “an outstanding female director in technolo- keley. He also a political cartoonist for vari- gy” (El PAIS, 2012), one of “100 leaders for ous publications and media as ‘Manu’. the future” (Capital, 2009) and one of the “40 youngsters who will mark the next millenni- um” (El PAIS, 1999). Nuria is Board member of ELLIS.eu and OdiseIA.org. Data-Pop Alliance is a global coalition on Big Data, AI and development created in 2013 by the Harvard Humanitarian Ini- tiative, MIT Media Lab, and Overseas De- velopment Institute, joined in 2016 by the Flowminder Foundation as its fourth core member, that brings together researchers, experts, practitioners, and activists to pro- mote a people-centered data revolution through collaborative research, capacity and community building, and strategic design and support activities. 29

The institute is Vodafone’s European think- tank. We analyze the potential of digital technologies and their responsible use for innovation, growth and sustainable so- cial impact. With the help of studies and events, we offer a platform for dialogue between thought leaders from science, business and politics. Our goal is to pro- vide better access to technology for all sections of society. The Vodafone Institute sees itself as an interdisciplinary platform and benefits from the expertise of its inter- national advisory board.

With its “Digitising Europe” initiative, the Vodafone Institute provides a platform for high-level debate and exchange. Its aim is to advance a vision of a European Union in the digital age, through papers, studies, workshops, and events. This paper is part of a series of discussion papers that are focussed on challenges of European digi- tal policy. 30

SHARING IS CARING Four Key Requirements for Sustainable Private Data Sharing and Use for Public Good

Authors Emmanuel Letouzé, PhD Nuria Oliver, PhD

Editorial Management Inger Paus, Managing Director, Vodafone Institute Friedrich Pohl, Head of Communications, Vodafone Institute

Assistance Maria Antonia Bravo, Program Officer, Data-Pop Alliance Luis del Carmen Garcia Rueda, Project Operations Officer, Data-Pop Alliance

Layout Nick Böse

Data-Pop Alliance, NYC November 2019