RoundRound table table

Sharing health data: good intentions are not enough Elizabeth Pisania & Carla AbouZahrb

Abstract Epidemiologists and researchers are moving very slowly in the data sharing revolution, and agencies that maintain global health databases are reluctant to share data too. Once investments in infrastructure have been made, recycling and combining data provide access to maximum knowledge for minimal additional cost. By refusing to share data, researchers are slowing progress towards reducing illness and death and are denying a public good to taxpayers who support most of the research. Funders of public health research are beginning to call for change and developing data sharing policies. However they are not yet adequately addressing the obstacles that underpin the failure to share data. These include professional structures that reward publication of analysis but not of data, and funding streams and career paths that continue to undervalue critical data management work. Practical issues need to be sorted out too: how and where should data be stored for the long term, who will control access, and who will pay for those services? Existing metadata standards need to be extended to cope with health data. These obstacles have been known for some time; most can be overcome in the field of public health just as they have been overcome in other fields. However no institution has taken the lead in defining a work plan and carving up the tasks and the bill. In this round table paper, we suggest goals for data sharing and a work plan for reaching them, and challenge respondents to move beyond well intentioned but largely aspirational data sharing plans.

الرتجمة العربية لهذه الخالصة يف نهاية النص الكامل لهذه املقالة. .Une traduction en français de ce résumé figure à la fin de l’article. Al final del artículo se facilita una traducción al español

Introduction more scientific and policy insights out of data that have already been collected. As they prepare for careers in science, today’s students doubtless Routine health and service use statistics can be just as useful hear the same clichés as we did a generation ago: science advances for policy analysis as research data. Many countries are reluctant collaboratively; we reproduce and extend the work of others; we to release detailed service use data because analysis by disinter- stand on the shoulders of giants. In some fields, such as genom- ested outsiders may contradict politically acceptable interpreta- ics, these axioms are becoming true. In and public tions. Most countries do, however, contribute aggregate statistics health, however, data sharing and collaboration remain more freely to large international databases maintained by multilateral aspirational than real. organizations, although they are not always granted free access Students embark on a career in health research in the spirit to those databases when they want to use them. Such restrictions of sharing; they want to help improve the well-being of others. on access, imposed unnecessarily by agencies wanting to protect For all the talk of collaboration, they will enter a world in which their institutional mandates, cripple the potential utility of these another axiom dominates: “publish or perish”. That system puts expensive resources. Researchers and governments are also reluc- the interests of public health researchers in direct conflict with tant to see the data they provide used and manipulated by others the interests of public health. in ways they don’t understand because secondary users (including international agencies) do not always publish their methods. Research data are desperately underused too, in part because Benefits of sharing of a critical shortage of competent data managers.5 In other fields – genetics, banking and retailing – data management is a valuable The situation was not so different in genomics less than 15 years skill. People are trained and develop careers in the field. In public ago. Then, after years of hoarding their findings in individual health research, data management is the poor cousin of analysis. laboratories and progressing at an expensive snail’s pace, in 1996 Undervalued and underfunded, inadequate data management 1 researchers agreed to share all their data openly. Now labora- undermines the rest of the scientific enterprise. One review in tories sequence during the day and post their results that same the United Kingdom of Great Britain and Northern Ireland night; other researchers can begin to stand on their shoulders found that many of the variables collected in epidemiological the very next day. As a result, genetic research is advancing faster studies were never cleaned and coded, so they could not be used 2 than any other area of biomedicine. even by the primary researchers, let alone shared.6 In complex Genomics has taught us that sharing data with other scien- population-based surveys in developing countries, data manage- tists is a way to add value without costing a lot. It allows the same ment and analysis skills are in even shorter supply, so a higher data to be used to answer new questions that may be relevant proportion of data probably goes to waste.7 far beyond the original study. And it allows for meta-analyses When we’re dealing with public health research, wasted that are free from the distortions introduced when only sum- data can translate into shorter, less healthy lives. Improving data mary results are available.3,4 We could get far more out of public management so that data can be shared is a first step to reducing health research if we followed a similar path, if we squeezed that waste. But it will not be enough. We need to change the in-

a London School of Hygiene and Tropical Medicine, Keppel Street, WC1E 7HT, London, England. b Department of Health Statistics and Informatics, World Health Organization, Geneva, Switzerland. Correspondence to Elizabeth Pisani (e-mail: [email protected]). (Submitted: 18 November 2009 – Revised version received: 5 January 2009 – Accepted: 7 January 2010 )

462 Bull World Health Organ 2010;88:462–466 | doi:10.2471/BLT.09.074393 Round table Elizabeth Pisani & Carla AbouZahr Sharing health data centives that pit the interests of individual recorded in anonymized data sets.10 Social mented editor Christine Laine at a recent researchers against the interests of public and economic sciences have also gone conference on biomedical publication.19 health, that pit institutional interests further in making the sharing of data sets There are indications that public against the more rapid advancement of easy through standard metadata, both for and foundation funders of public health knowledge and understanding. Govern- aggregate data through Statistical Data research wish to strengthen data sharing ments may hold micro-data back from and Metadata Exchange (SDMX) stan- policies, shepherding epidemiologists international organizations, but there’s dards and for individual data using Data down the road already travelled by ge- no excuse for international organizations Documentation Initiative (DDI 3.0) neticists.20–23 Many field researchers who to limit access to the aggregate data that standards. A further lesson from other have battled difficult climates, erratic governments do provide. fields: it is possible to make data widely electricity supplies, fuel shortages and It’s easier to understand why indi- available to the research community while recalcitrant local authorities will doubt- vidual researchers are reluctant to share still safeguarding integrity, through the less resent increasing pressure to “give data data they have collected. That reluctance use of standardized data use agreements away”. Some are also apprehensive that will certainly remain entrenched as long as and licences.11,12 These define who may use people looking at the data in the comfort their employers – research councils, foun- data and how, and may require secondary of some distant, well resourced office will dations and universities – regard publica- analysts to contribute both derived data spot the errors that are the inevitable tion of research papers in peer-reviewed and a record of their analytic methods by-product of research in the real world. biomedical journals as the main yardstick back to the database, so that primary and Governments are equally reluctant to of success.8 If, however, “publish [papers] other users can both verify and benefit expose their data to interpretations other or perish” were to be replaced by “publish from their work. than those published by their official stat- [data] or perish”, the picture might change The data that we collect and don’t isticians. There is a fear, too, that data may rapidly, as it did in genomics. make full use of do not come free. The be used by others not just for professional What did that experience teach us? collection of routine health statistics is but for economic gain. This is sometimes That a change in the culture of science paid for by our tax money. Most research cast as a “north–south” divide; one spectre requires the buy-in of key research teams, aiming to reduce ill-health in the develop- raised is of pharmaceutical companies yes, but that it also requires considerable ing world is also funded either from the exploiting data from developing countries and very concrete commitments from public purse or by charitable foundations. to develop products that those countries funders. The two largest funders of the It is irrational to invest so much in col- then can’t afford.24 Human Genome Project, the Wellcome lecting data and yet so little in ensuring Feelings of ownership over hard-won Trust and the National Institutes for that we make the best use of it.13 It is also data, viscerally held even by researchers Health, invested massively in the infra- ethically unsound; people who participate who support the idea of data sharing in structure needed to share data on a large in research have a right to expect that the principle, are understandable. And peer scale for the long term. They also changed results will be used to improve life for reviewers, mostly researchers themselves, funding mechanisms to emphasize team them and/or for their communities. are reluctant to approve funding for data work and the value of roles such as data Funders and standard-setters have management if it cuts into budgets for management, rather than just looking at been aware of this for some time. Gradu- data collection. But funders of science publication and citation records. Inevi- ally, they are urging or adopting policies are themselves under pressure to get the tably the rapid change of culture raised that aim to increase the use and recycling most out of expensive research studies. some tensions, but those have now largely of data. Although they don’t all yet They have to wrestle with two important been resolved.2 It would be perfectly fea- practice what they preach, several inter- questions: how much data sharing is desir- sible for research funders to take similar national organizations, including the Or- able and how much is feasible? steps in other fields so that personal and ganisation for Economic Co-operation Researchers sometimes argue that professional incentives are aligned rather and Development and the World Health interpretation of their data is so depen- than in conflict. Organization, have issued statements dent on understanding local conditions Genomics and the social sciences calling for increased access to routine that the data would be worthless to (which have a dramatically better record statistics and other publicly-funded other scientists. This is often a reflection of sharing data than most biomedical data.14,15 Many biomedical journals have of inadequate documentation, but also a sciences) have developed techniques to recently addressed the importance of data necessary failure of imagination. Sailors deal with two of the other main obstacles sharing in editorials and commentary keeping log books on whaling boats in to sharing of public health research data articles.16–18 A few biomedical journals the 1600s could not have predicted that, – confidentiality and consent. In part expect researchers to make the data that centuries later, the data would be an im- because of the development of research underlie research articles available to oth- portant source of information for climate tissue banks (biobanks), broad consent ers on request. An even smaller number of change scientists.25 Most funders have procedures are increasingly becoming a journals have followed the lead of Annals stringent peer-review procedures; few in- norm.9 Anonymization removes some of of Internal Medicine and now require au- vest in research that they believe is of only the obstacles associated with consent, and thors to state whether and how they will very localized importance, and few wish techniques for protecting identities are make protocols, analysis tools and data to support research that produces data of improving constantly. Despite concerns available to others. But even Annals stops such poor quality that it has no further about the theoretical possibility of iden- short of requiring authors to publish data value. Publicly-funded data can also be tifying individuals in shared data sets, no sets along with their articles. “If we did invaluable to students learning data man- breaches of confidentiality have yet been that, we‘d have a very thin journal,” com- agement and analysis skills. It thus seems

Bull World Health Organ 2010;88:462–466 | doi:10.2471/BLT.09.074393 463 Round table Sharing health data Elizabeth Pisani & Carla AbouZahr fair to expect that almost all public health reward researchers professionally for to the work of both primary researchers research funded by taxpayers or charities sharing data; and (v) all planned research and funders of research. There is a need to might be useful to secondary analysts. If will budget and be funded to manage data agree on norms and standards governing a piece of research is considered worthy professionally to a quality adequate for fair use periods for primary researchers, of publication in a peer-reviewed journal, archiving and sharing. data access policies and data use agree- the underlying data should also be worth ments. publishing. Plan of work How feasible would it be to make Data libraries These goals can only be achieved with these data available to the scientific com- To underpin the long-term viability of considerable investment in several practi- munity? Technically, the challenges are data libraries, we need to invest in expand- cal areas. We propose the following plan not trivial, but they have been overcome ing existing infrastructure to cover cura- in several other fields; they are broken of work, necessary to underpin progress towards our stated goals. tion and access of data of public health down here into manageable parts. We importance. This calls for a business or maintain that the major constraints to Fill the gaps in data management funding model that assures the long-term feasibility are a cultural resistance to viability of data archives. change from within our own scientific There is a need to develop metadata stan- community, and a reluctance of any insti- dards, which will lead to improved docu- tution to take leadership of the data shar- mentation and allow data to be combined Conclusion ing agenda. We also believe, however, that more easily across time, locations and All of these areas have already been identi- the imperative to share data will only grow sources. This will probably require the ex- fied as critical to promoting data sharing, stronger. The research community should tension of DDI and SDMX standards to often repeatedly so.5,30–32 Funders, govern- look at this pressure from funders as an encompass areas of public health interest. ments, publishers and many researchers opportunity rather than an imposition. Agreement is also required on standards want these things to happen, it seems. for anonymization and safeguarding of Some of the organizations calling for Goals for funders and confidentiality. greater sharing of public health research researchers We need to develop a search portal data have expressed willingness to pay for that will allow data to be discovered across parts of the work. But none are willing Here we propose several goals to which a range of repositories, and standards to take charge of the agenda, commit- funders and researchers can jointly as- for repositories similar to those used for ting themselves to orchestrating the dull, 26 pire and towards which progress can be registries of clinical trials. We also need messy but essential work of developing the measured: (i) all data of potential public to invest in training in data management norms and standards that will allow data health importance funded by taxpayers or for public health, especially in developing sharing to revolutionize public health foundations will be appropriately docu- countries, and the development of career research. mented and archived in formats acces- paths in bioinformatics. It is time to move beyond expres- sible to the wider scientific community; sions of good intentions and to get on (ii) all data provided by governments to Increase incentives to share data with the practical work that will allow databases developed by publicly-funded We need to further develop and adopt data to be shared. The first thing that is organizations will be freely available to reliable citation standards for data sets, needed is leadership. We challenge other any user, at the level of detail at which it such as those proposed by DataCite col- participants in this round table to commit was provided; (iii) the publication of a laboration,27 and ensure they are indexed to coordinating, funding or carrying out research article in a biomedical journal in databases such as PubMed. Standards the work described in this paper. Only will be accompanied by the publication and procedures for peer review or qual- after someone takes the lead in tackling of the data set upon which the analysis ity control of data sets are also needed. these issues will today’s students of public is based; (iv) funders and employers of Digital fingerprinting of data would al- health be able to climb onto the shoulders researchers will consider publication of low tracing of secondary use 28,29 and we of the current giants in our field.■ well managed data sets as an important should develop methods and measures indicator of success in research, and will to track the value that sharing data adds Competing interests: None declared.

ملخص تبادل البيانات الصحية: النوايا الحسنة وحدها ال تكفي إن تحرك علامء الوبائيات والباحثني يف الصحة العمومية مازال بطيئا ًيف وقد بدأ املانحون املطالبة بالتغيري وإعداد سياسات لتبادل البيانات. إال مجال ثورة تبادل البيانات، ومازالت أيضا ًالوكاالت املعنية بحفظ قواعد أنهم حتى اآلن مل يتصدوا عىل النحو الكايف للعقبات املؤدية لفشل تبادل البيانات اإلحصائية مرتددة يف تبادل البيانات. فحاملا يكتمل االستثامر يف البيانات. والتي تتضمن النظام املهني الذي يكافئ نرش التحليالت ًبدال من البنية األساسية، سيتيح إعادة تدوير ودمج البيانات الوصول إىل أكرب قدر البيانات، ومسارات التمويل والتطور الوظيفي التي تواظب عىل التقليل من من املعارف بأقل قدر من التكلفة اإلضافية. ويؤدي رفض تبادل البيانات قيمة العمل الخاص بإدارة البيانات الهامة. كام أن املواضيع العملية ينبغي إىل إبطاء التقدم املحرز نحو الحد من املراضة والوفيات ومينع تحقيق النفع تخزينها أيضا: ًولكن كيف وأين يجب تخزينها ألمد طويل، ومن سيتحكم يف العام لدافعي الرضائب الداعمني لغالبية هذه البحوث. الوصول إليها، ومن سيدفع نفقات هذه الخدمات؟ ومن الرضوري توسيع

464 Bull World Health Organ 2010;88:462–466 | doi:10.2471/BLT.09.074393 Round table Elizabeth Pisani & Carla AbouZahr Sharing health data

نطاق املعايري الحالية الخاصة بالبيانات املمكن الوصول إليها حتى تتواءم مع التكاليف. ومن خالل هذه املائدة املستديرة، سنقرتح املرامي املنشودة من البيانات الصحية. تبادل البيانات وخطة العمل لبلوغ هذه املرامي، والتصدي لهذا التحدي إن هذه العقبات معروفة منذ زمن؛ وأغلبها ميكن التغلب عليه يف مجال للوصول ملا هو أبعد من خطط تبادل البيانات ذات املقاصد الجيدة واملثرية الصحة العمومية كام أمكن التغلب عليها يف مجاالت أخرى. ولكن مل تتبوأ للحامس البالغ. أي مؤسسة مكان الصدارة يف تحديد خطة العمل ومل تحدد املهام وقامئة

Résumé Partage des données sur la santé : les bonnes intentions ne suffisent pas Les épidémiologistes et les chercheurs en santé publique s’engagent publication d’une analyse, mais pas celle des données, et de flux de très lentement dans la révolution que subit le partage des données et financement et d’évolutions de carrière qui continuent de sous-évaluer les agences chargées d’entretenir les bases de données mondiales sur le travail essentiel de gestion des données. Il convient aussi de sérier la santé sont réticentes à ce partage. Une fois certains investissements les problèmes pratiques: où et comment les données doivent-elles être consentis dans les infrastructures, le recyclage et la combinaison des stockées sur le long terme, qui exercera un contrôle sur les accès et qui données peuvent donner accès à un maximum de connaissances pour paiera pour ces services ? Les normes existantes pour les métadonnées un coût additionnel minimal. En refusant le partage des données, les doivent être étendues pour couvrir les données relatives à la santé. chercheurs ralentissent les progrès vers la réduction de la morbidité et Ces obstacles sont connus depuis un certain temps ; la plupart d’entre de la mortalité et interdisent l’accès à l’information à un public tout juste eux peuvent être surmontés dans le domaine de la santé publique tout bon à payer les impôts qui financent la plupart de leurs recherches. comme ils l’ont été dans d’autres secteurs. Néanmoins, aucune institution Les apporteurs de fonds pour la recherche en santé publique n’a pris la direction des opérations pour définir un plan de travail et répartir commencent à appeler au changement et à développer des politiques de les tâches et la facture. Dans cet article destiné à une table ronde, nous partage des données. Cependant, ils n’ont pas encore trouvé de moyens proposons des objectifs pour le partage des données et un plan de travail adéquats pour aplanir les obstacles responsables de l’échec de ce partage. pour les atteindre et nous sollicitons des réponses pour aller au-delà de Il s’agit notamment de structures professionnelles qui récompensent la plans de partage des données bien intentionnés, mais largement utopistes.

Resumen Intercambio de datos sanitarios: las buenas intenciones no son suficientes Los epidemiólogos e investigadores en salud pública están avanzando pero no de datos, y unas fuentes de financiación y unas perspectivas de muy lentamente en la revolución del intercambio de datos, y además los carrera que siguen sin reconocer el carácter crucial de la gestión de datos. organismos que mantienen las bases de datos mundiales sobre salud Es preciso esclarecer también algunos aspectos prácticos: cómo y dónde se muestran reacios a compartir su información. Una vez realizadas las deben conservarse los datos a largo plazo, quién controlará el acceso inversiones en infraestructuras, la reutilización y combinación de datos y quién pagará esos servicios. Las normas existentes sobre metadatos brindan acceso a un máximo de conocimientos con un costo adicional deben ampliarse para poder manejar los datos sanitarios. mínimo. Al negarse a compartir los datos, los investigadores están Estas dificultades son conocidas desde hace ya algún tiempo, pero frenando los progresos hacia la reducción de la morbilidad y la mortalidad la mayoría pueden ser superadas en el campo de la salud pública al y están negando un bien público a contribuyentes que apoyan la mayor igual que han sido superadas en otros campos. Sin embargo, ninguna parte de las investigaciones. institución ha tomado la iniciativa para definir un plan de trabajo y Los agentes de financiación de las investigaciones en salud pública repartirse las tareas y los costos asociados. En este artículo de la mesa están empezando a exigir cambios y a elaborar políticas de intercambio redonda proponemos metas para el intercambio de datos y un plan de de datos. Sin embargo, aún no están abordando adecuadamente los trabajo para su consecución, y alentamos a los encuestados a trazar algo obstáculos que impiden compartir esos datos. Entre ellos cabe citar unas más que unos planes de intercambio de datos bienintencionados pero estructuras profesionales que recompensan la publicación de análisis, demasiado ambiciosos.

References 1. Smith D, Carrano A. International large-scale sequencing meeting. 5. Lord P, MacDonald A, Sinnot R, Ecklund D, Westhead M, Jones A. Large-scale Human Genome News 1996;7. Available from: http://www.ornl.gov/sci/ data sharing in the life sciences: data standards, incentives, barriers and techresources/Human_Genome/publicat/hgn/v7n6/19intern.shtml [accessed funding models (The “Joint data standards study”). Glasgow & Edinburgh: 26 February 2010]. National e-Science Centre; 2006. Available from: http://www.nesc.ac.uk/ 2. Kaye J, Heeney C, Hawkins N, de Vries J, Boddington P. Data sharing in technical_papers/uk.html [accessed 26 February 2010]. genomics – re-shaping scientific practice.Nat Rev Genet 2009;10:331–5. 6. Corti L, Wright M. MRC Population data archiving and access. London: doi:10.1038/nrg2573 PMID:19308065 Medical Research Council; 2002. 3. Nüesch E, Trelle S, Reichenbach S, Rutjes AWS, Bürgi E, Scherer M et al. The 7. Chandramohan D, Shibuya K, Setel P, Cairncross S, Lopez AD, Murray CJ effects of excluding patients from the analysis in randomised controlled trials: et al. Should data from demographic surveillance systems be made more meta-epidemiological study. BMJ 2009;339:b3244. doi:10.1136/bmj.b3244 widely available to researchers? PLoS Med 2008. 5:e57. doi:10.1371/ PMID:19736281 journal.pmed.0050057 PMID:18303944 4. Elobeid MA, Padilla MA, McVie T, Thomas O, Brock DW, Musser B et al. 8. Field D, Sansone SA, Collis A, Booth T, Dukes P, Gregurick SK, Kennedy K Missing data in randomized clinical trials for weight loss: scope of the et al. Omics data sharing Science 2009. 326:234–236. doi doi:10.1126/ problem, state of the field and performance of statistical methods.PLoS ONE science.1180598 2009;4:e6624. doi:10.1371/journal.pone.0006624 PMID:19675667

Bull World Health Organ 2010;88:462–466 | doi:10.2471/BLT.09.074393 465 Round table Sharing health data Discussion

9. Mascalzoni D, Hicks A, Pramstaller P, Wjst M. Informed consent in the genomics era. PLoS Med 2008;5:e192. doi:10.1371/journal.pmed.0050192 Round table discussion PMID:18798689 10. Homer N, Szelinger S, Redman M, Duggan D, Tembe W, Muehling J et al. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet Publishing research data on a professional 2008;4:e1000167. doi:10.1371/journal.pgen.1000167 PMID:18769715 basis 11. Application to use restricted microdata. Minneapolis: IPUMS International. a Available from: https://international.ipums.org/international/ [accessed 26 Toby Green February 2010]. 12. UK Data Archive. End user licence. Colchester: University of Essex; 2008. As Pisani & AbouZahr have identified, there are many obstacles Available from: http://www.data-archive.ac.uk/aandp/access/licence.asp to the publishing of data: social (incentives for researchers to [accessed 26 February 2010]. 13. Pisani E, Whitworth J, Zaba B, AbouZahr C. Time for fair trade in research make the effort to publish), financial (having adequate financing data. Lancet 2010;375:703–5. doi:10.1016/S0140-6736(09)61486-0 to cover short-term publishing and long-term curation costs), PMID:19913902 and technical (standards and systems).1 This paper looks at some 14. OECD Principles and guidelines for access to research data from public of the technical challenges of publishing data professionally and funding. Paris: Organisation for Economic Co-operation and Development; describes the discoverability and citability benefits that follow. 2007. Let’s take it as read that publishing research data is a “good 15. Global strategy and plan of action on public health, innovation and intellectual property. Geneva: World Health Organization; 2008. thing,” that researchers are as willing to publish data as they are 16. How to encourage the right behaviour. Nature 2002;416:1. research papers and funding is in place to make them available doi:10.1038/416001b online in the long run. Job done? Well, no, not by a long chalk. 17. Data’s shameful neglect. Nature 2009;461:145. doi:10.1038/461145a Just as loading a journal article onto a web site somewhere 18. PLoS Medicine Editors. Next stop, don’t block the doors: opening up access to clinical trials results. PLoS Med 2008;5:e160. doi:10.1371/journal. isn’t the same as publishing it properly, so the same is true for pmed.0050160 PMID:18630986 data. To be as discoverable and as citable as research articles, data 19. Laine C, Berkwits M, Mulrow C, Schaeffer MB, Griswold M, Goodman sets need to be published using an infrastructure that is compat- S. Reproducible research: biomedical researchers’ willingness to share ible with research articles. It is not enough that data sets hang information to enable others to reproduce their results. In: Sixth International like dongles off a research article; they need to be discoverable Congress on Peer Review and Biomedical Publication, Vancouver, Canada, 10–12 September 2009. Available from: http://www.ama-assn.org/public/ and citable in their own right – just like a journal article. This peer/abstracts-0910.pdf [accessed 26 February 2010]. means the metadata must be compatible with existing biblio- 20. NIH guide: final NIH statement on sharing research data. Bethesda: National graphic management and citation systems like Ref Works® and Institutes of Health; 2003. Available from: http://grants.nih.gov/grants/oer.htm CrossRef®. Users will expect search engines, abstracting and [accessed 26 February 2010]. indexing services and library catalogues to reference data sets, 21. MRC Policy on data sharing and preservation. London: Medical Research Council; 2008. Available from: http://www.mrc.ac.uk/PolicyGuidance/ so, for example, librarians will need MARC (MAchine-Readable EthicsAndGovernance/DataSharing/PolicyonDataSharingandPreservation/ Cataloging) records. index.htm [accessed 26 February 2010]. Is this overkill? Well, the Organisation for Economic Co- 22. Policy on data management and sharing. Wellcome Trust; 2007. Available operation and Development (OECD) doesn’t think so. OECD from: http://www.wellcome.ac.uk/About-us/Policy/Policy-and-position- publishes more than 390 data sets as stand-alone objects, as well statements/WTX035043.htm [accessed 26 February 2010]. 23. Sharing public health data: a code of conduct. London: Wellcome Trust; as thousands of data sets as supplemental data to its books and 2008. Available from: http://www.wellcome.ac.uk/About-us/Policy/Spotlight- journal articles. Sub-sets of the data sets are also posted on the issues/Data-sharing/Public-health-and-epidemiology/index.htm [accessed 26 web as stand-alone objects too. So it is no surprise that, in the February 2010]. absence of good discovery metadata and systems, the number 24. Supari SF. Saatnya dunia berubah: tangan Tuhan di balik virus flu burung / Siti one complaint from users is the challenge of finding a relevant Fadilah Supari [in Indonesian]. : Sulaksana Watinsa ; 2008. 25. International Comprehensive Ocean-Atmosphere Data Set. Washington, DC: data set. They know the data is there, but they can’t find it – even National Oceanic and Atmospheric Administration; 2009. Available from: with Google’s help. http://icoads.noaa.gov/ [accessed 26 February 2010]. To solve this problem, OECD’s Publishing Division has 26. International Clinical Trials Registry Platform, WHO Registry Criteria, version spent the past three years grappling with the challenge of how 2.1. Geneva: World Health Organization; 2009. Available from: http://www. to publish these many thousands of data objects so that users who.int/ictrp/network/criteria_summary/en/index.html [accessed 26 February 2010]. can not only find the data they need, but can then cite and 27. DataCite - International initiative to facilitate access to research data. manage the data sets using the same tools that they already use Hannover: German National Library of Science and Technology; 2009. to manage journal articles or book chapters. The first result was a Available from: http://www.datacite.org/ [accessed 26 February 2010]. white paper,2 first released in March 2009, which described this 28. Paskin N. Digital Object Identifier (DOI) System. In:Encyclopedia of library and challenge and proposed a set of metadata schema for databases information sciences. New York: Taylor & Francis; 2008. 29. Altman M, King G. A proposed standard for the scholarly citation of in their own right, sub-sets of databases and supplemental data. quantitative data. D-Lib 2007. 13. More significantly, was the launch of OECD iLibrary, 30. Lowrance W. Access to collections of data and materials for health research. OECD’s new publishing platform, in July 2009. OECD iLi- London: Medical Research Council; 2006. brary3 hosts all OECD books, working papers, journals and data 31. Pisani E. OpenEpi: a new culture for public health data? London: Wellcome sets in a seamless manner. OECD iLibrary puts the white paper’s Trust; 2008. 32. National Academy of Sciences. Ensuring the integrity, accessibility and proposed bibliographic schema for data objects into practice. stewardship of research data in the digital age. Washington, DC: National Search for “health data” and the search results include data sets, Academy Press; 2009. book chapters – even individual tables found inside books. a Organisation for Economic Co-operation and Development, 2 rue André Pascal, 75775 Paris Cedex 16, France. Correspondence to Toby Green (e-mail: [email protected]).

466 Bull World Health Organ 2010;88:466–467 | doi:10.2471/BLT.10.076943 Round table Discussion Sharing health data

OECD’s data sets can now be discovered more easily and they these risks and can use a range of quality assessment techniques can be cited as simply and as easily as a research article using the to deal with errors. downloadable citation provided. Later in 2010, librarians will Mentoring is one incentive that is missing from the other- be supplied with MARC records and the bibliographic records wise excellent set proposed by Pisani & AbouZahr. Partnerships for OECD data sets will be shared with discovery platforms like between researchers and data collectors, including intensive RePEc (Research Papers in Economics)4 – the world’s largest methodological workshops, are feasible and can help ensure that collection of economics grey literature – enabling visitors to those who collect data realize the public health potential and find data objects alongside working papers and journal articles. value of their efforts. Such an approach could rapidly increase Imagine being able to discover and cite data sets as easily and as analytical capacity and diversify the analysis of rich, but unde- simply as journal articles. Imagine no more. ■ rutilized, data sets. Funding such collaborations would be an innovative and constructive use of research funds. Competent Competing interests: None declared. analysts should be able to resolve potential challenges in inter- References preting data because of specific local conditions surrounding 1. Pisani E, AbouZahr C. Sharing health data: good intentions are not enough. their collection. Restricting access on this basis reflects a lack of Bull World Health Organ 2010;88:462–6. confidence, imagination or trust by those who collect data and 2. Green T. We need publishing standards for datasets and data tables. Paris: should be questioned when used to preclude further analysis. Organisation for Economic Co-operation and Development; 2010. Available The authors propose an urgent agenda for action to improve from: http://dx.doi.org/10.1787/603233448430 [accessed 1 March 2010]. data sharing practices that will benefit all stakeholders – data 3. OECD iLibrary [Internet site]. Paris: Organisation for Economic Co-operation collectors, analysts, the policy community and, ultimately, the and Development; 2009. Available from: www.oecd-ilibrary.org [accessed 19 May 2010]. public. This is admirable but, for such a plan to succeed, funders, 4. RePEc (Research Papers in Economics) [Internet site]. Available from: www. researchers and data collectors alike need to understand its ben- repec.org [accessed 1 March 2010]. efits. That will only happen with effective and committed leader- ship. What better role for the World Health Organization? ■

Sharing data for public health: where is the Competing interests: None declared. vision? References Alan D Lopeza 1. Pisani E, AbouZahr C. Sharing health data: good intentions are not enough. Bull World Health Organ 2010;88:462–6. “By refusing to share data, researchers are slowing progress 2. Setel PW, Macfarlane SB, Szreter S, Mikkelsen L, Jha P, Stout S et al.; Monitoring of Vital Events. A scandal of invisibility: making everyone count towards reducing illness and death.” Pisani & AbouZahr are by counting everyone. 2007;370:1569–77. doi:10.1016/S0140- 1 Lancet making a big claim in this round table. Is this claim sensationalist 6736(07)61307-5 PMID:17992727 or does it have some basis? Can we argue that data from public 3. Horton R. Counting for health. Lancet 2007;370:1526. doi:10.1016/S0140- health research really affect the ways prevention and control pro- 6736(07)61418-4 PMID:17992726 grammes are designed? Lives have become longer and healthier 4. Murray CJL, Lopez AD. The utility of DALYs for public health policy and in the past 50 years, despite an arguably poor evidence base for research: a reply. Bull World Health Organ 1997;75:377–81. PMID:9342897 health and an even poorer appreciation by policy-makers of the 5. Rajaratnam JK, Tran LN, Lopez AD, Murray CJL. Measuring under-five 2,3 mortality: validation of new low-cost methods. PLoS Med 2010;7:e1000253. value of reliable health information. Pisani & AbouZahr are doi:10.1371/journal.pmed.1000253 PMID:20405055 arguing that such gains would have been bigger, faster and more 6. Obermeyer Z, Rajaratnam JK, Park CH, Gakidou E, Hogan MC, Lopez AD et al. equitable had the world had better information about what Measuring adult mortality using sibling survival: a new analytical method and works and does not work in public health; lost ground is partly new results for 44 countries, 1974–2006. PLoS Med 2010;7:e1000260. due to widespread hoarding of research findings, particularly doi:10.1371/journal.pmed.1000260 PMID:20405004 primary data. 7. Murray CJL, Rajaratnam JK, Marcus J, Laakso T, Lopez AD. What can They have a point. Restricting access to data to only those we conclude from death registration? Improved methods for evaluating scientists directly engaged in a research project limits the scope completeness. PLoS Med 2010;7:e1000262. doi:10.1371/journal. pmed.1000262 PMID:20405002 of legitimate scientific enquiry and the potential for research to influence policy and practice. No individual scientist who collects or collates data has all the possible analytic methods, expertise and time to extract key public health messages from Data sharing: reaching consensus research or routine data sets.4–7 Lost opportunity for analysis is Jimmy Whitworthb the main consequence of poor data sharing practices. Yet, as Pisani & AbouZahr argue, it is unreasonable to expect Pisani & AbouZahr write passionately about the need to change data collectors to share without adequate incentives. Incentives the culture of data sharing in public health research.1 They ex- could include professional recognition for well collected and plain why this is in everybody’s best interests and outline ways in documented data, appropriately disseminated using good data which the main obstacles can be overcome. This is laudable and management practices. Data collectors too need assurance that much appreciated; it is time for a change, the current situation is their efforts will be respected and that errors in data are inevi- unacceptably inefficient in terms of scientific progress and value table and rarely disastrous. Experienced researchers are aware of for money from research. a School of Population Health, University of Queensland, Herston Road, Herston, Qld., 4006, Australia (e-mail: [email protected]). b Wellcome Trust, Gibbs Building, 215 Euston Road, London, NW1 2BE, England (e-mail: [email protected]).

Bull World Health Organ 2010;88:467 | doi:10.2471/BLT.10.078956 467 Round table Sharing health data Discussion

The two authors challenge institutions, in particular research References funders, to take charge of the agenda to make these changes 1. Pisani E, AbouZahr C. Sharing health data: good intentions are not enough. happen. They call for leadership but, while funding agencies Bull World Health Organ 2010;88:462–6. are clearly influential and can certainly facilitate changes in scientific behaviour and culture, they are unlikely to be able to effect all the changes called for by Pisani & AbouZahr. While Sharing health data: developing country funders might support and encourage, we are not in a position perspectives to dictate changes to professional structures, to create career a b paths for scientific disciplines at academic institutions, nor to Viroj Tangcharoensathien, Jirawan Boonperm & c determine scientific reward mechanisms. Pongpisut Jongudomsuk What is required as a first step is the facilitation of dialogue and the building of consensus between all interested parties, Not only is it difficult to change the “publish or perish” including funders, researchers, institutions, journal editors, mindset among health researchers, there are other fundamen- ethics committees, multilateral agencies and governments. No tal barriers in data sharing that Pisani & AbouZahr’s paper one agency has the mandate or the legitimacy to take this whole should have addressed.1 Sharing data is not only about the agenda forward unilaterally. A more sustainable and palatable technical dimension such as data management, repositories pathway be will to build consensus and to create a broad coali- and libraries; developing countries are concerned about fac- tion. tors that impede data sharing, in particular, fairness. Pisani It is worth reflecting on why data sharing is not more com- & AbouZahr provide clear analyses on barriers but their monly practiced among epidemiologists and public health re- proposed solutions will not be effective unless they address searchers. Pisani & AbouZahr point out many of the constraints, the fundamental problems. such as the lack of appropriate incentives from employers such From the perspective of developing countries, the goal of as research councils, foundations and universities, the short data sharing is beyond national interests and is for the benefit of supply of data managers especially in low- and middle-income all mankind. Without this explicit goal, data sharing more often countries, and concerns over the control and ownership of data. helps scientists in developed countries get published. While There are also technical issues, data sets for cohort studies are these scientists may have higher analytical capacities, they have more complicated than standard genetic data sets because of neither shared the “legwork” in collecting routine administra- their longitudinal nature, and there are no off-the-shelf tools tive data nor made intellectual contributions to designing and available for managing and curating standardized and interoper- solving problems in conducting field work with scientists in able longitudinal data sets. developing countries. Overcoming these constraints requires a broad consensus Developing countries need to strengthen capacities in survey among stakeholders. Indeed Pisani & AbouZahr seem to ac- design, data management and analysis and policy use. There is knowledge this. When they write that “we” need to develop a clearly an unlevel playing field that impedes data sharing. Scien- search portal, invest in training in data management, develop tists from developed countries often take the following approach reliable citation standards, develop methods to track the value with researchers in developing countries: “Share your data with of data sharing, and so on, these are clearly tasks for the wider me, you do not have analytical capacities. I will analyse and scientific community. publish papers for global public good.” Instead, their approach Of course, individual institutions – and funders – can take should be: “We can analyse the data together and learn from the initiative over certain aspects of the agenda and form alliances each other for the benefit of all people.” This approach would with those agencies that can help in other domains. Indeed, the gradually create equal partnerships, a level playing field, good- Wellcome Trust has already led various initiatives in this field, will and trust for collaborations beyond simply sharing data.2–4 including convening international meetings of public health International data sharing cannot be achieved through forced researchers and funding agencies, and has raised these issues at marriage; as shown by the defeat of the policy proposed by the meetings of public health policy-makers and international jour- Annals of Internal Medicine of a publicly accessible database as nal editors. The Trust is currently revising its grant conditions a condition for journal publication.5 about data sharing, which will be strengthened and, importantly, The recent sharing of avian flu virus specimens by developing will provide more guidance about the technicalities of how to countries through the World Health Organization resulted in the share data more effectively. We are ready to take the lead in those production of avian influenza vaccines at a price of US$ 10–20 areas, where it is appropriate for us to do so, and we are open to per dose. This is unaffordable in low-income countries where the formation of alliances with other agencies that can help to total health expenditure is less than US$ 30 per person. Should facilitate progress in other areas. ■ an avian flu pandemic occur, there would be huge death tolls in countries without access to vaccines; while rich countries’ Competing interests: Jimmy Whitworth is employed by the populations would be fully protected, literally from any moral Wellcome Trust, which commissioned Elizabeth Pisani to work obligations to countries that shared their specimens. Such uni- on its data-sharing project. lateral benefit inhibits data sharing.

a International Health Policy Program, Ministry of Public Health, 376 Mooban Panya, Patanakan Road, Bangkok, 10250, Thailand. b National Statistical Office, Laksi, Thailand. c Health Systems Research Institute, Bangkok, Thailand. Correspondence to Viroj Tangcharoensathien (e-mail: [email protected]).

468 Bull World Health Organ 2010;88:467–468 | doi:10.2471/BLT.10.079202 Round table Discussion Sharing health data

It is important to have evidence on the benefits that popula- References tions receive directly as a result of sharing, beyond publications 1. Pisani E, AbouZahr C. Sharing health data: good intentions are not enough. by secondary users. Success in international data sharing may start Bull World Health Organ 2010;88:462–6. with efforts at country level or through multi-country research 2. Pitayarangsarit S, Tangcharoensathien V. Sustaining capacity in health policy partnerships. Undeniably, multi-country studies provide huge and systems research in Thailand. Bull World Health Organ 2009;87:72–4. doi:10.2471/BLT.07.044479 PMID:19197407 benefit in supporting evidence-based policy. Collaborative part- 3. Pitayarangsarit S, Tangcharoensathien V. Capacity development for health nerships among a number of developed and developing coun- policy and systems research: experience and lessons from Thailand. In: Green tries, such as for maternal and perinatal health, are foundations A, Bennett S, eds. Sound choices: enhancing capacity for evidence-informed for building long-term trust.6 In research partnerships, there is health policy. Geneva: World Health Organization; 2007. equitable access to and use of data sets, beyond the conventional 4. Mayhew SH, Doherty J, Pitayarangsarit S. Developing health systems practice of passive data sharing without partnership. research capacities through north-south partnership: an evaluation of collaboration with South Africa and Thailand. Health Res Policy Syst In Thailand, rules and procedures for data sharing were 2008;6:8. doi:10.1186/1478-4505-6-8 PMID:18673541 developed through a research funding agency and the National 5. Laine C, Berkwits M, Mulrow C, Shaeffer MG, Griswold M, Goodman S. Statistical Office. Primary users were granted a reasonable-use Reproducible research: biomedical researchers’ willingness to share period of two years after complete data collection prior to access information to enable others to reproduce their results. In: Sixth International by secondary users. Good practices are emerging. With the aim Congress on Peer Review and Biomedical Publication, Vancouver, 10–12 of capacity building and mutual benefit, the National Statistical September 2009. Available from: http://www.ama-assn.org/public/peer/ Office grants approval to international secondary users to access abstracts-0910.pdf [accessed 26 April 2010] 6. Lumbiganon P, Laopaiboon M, Gülmezoglu AM, Souza JP, Taneepanichskul nationally representative household data sets, on the condition S, Ruyan P et al.; World Health Organization Global Survey on Maternal that they develop partnerships with local scientists. Such en- and Perinatal Health Research Group. Method of delivery and pregnancy gagement gradually builds trust and longer-term partnerships outcomes in Asia: the WHO global survey on maternal and perinatal health between scientists from developed and developing countries. ■ 2007–08. Lancet 2010;375:490–9. doi:10.1016/S0140-6736(09)61870-5 PMID:20071021 Competing interests: None declared.

Bull World Health Organ 2010;88:468–469 | doi:10.2471/BLT.10.079129 469