THE DANISH NATIONAL RESEARCH FOUNDATION OPEN ACCESS TO DATA – IT’S NOT THAT SIMPLE OPEN ACCESS TO DATA – IT’S NOT THAT SIMPLE
More data have been created in the past two years Rather than making detailed studies of just a few than in all human history. stars, SAC can now characterize planetary systems around distant stars and probe the properties of Five years ago, a cardiovascular geneticist would thousands of stars. Very few scientific fields are compare 200 DNA samples from healthy and dis untouched by these new scales of quantitative data. eased individuals to address the possible inheritance of a disease. Today, she has 60,000 complete human To elucidate the advantages and challenges of open genomes available, free of charge, to address the data in science, the Danish National Research same question. A few years ago, classical geogra Foundation (DNRF) conducted a survey among its phers analyzed the problem of flooding. Today, researchers on open access to data. The results datamaticians can analyze the problem from 220 are presented here. The short message is that it billion readings in Denmark alone, enabling them to is a challenge to exploit the possibilities of open draw up an exact potential flood map of the country. data wisely. The space telescope Kepler provides the Stellar Astrophysics Centre (SAC) with hundreds of terabytes of data.
“This digital revolution is an exceptional opportunity to excel in research, but also a challenge to exploit wisely”
– Professor Liselotte Højgaard, Chair of the Board of the DNRF As stated by the Ministry of Higher Education and “ The data is very difficult to understand and Science : “Open Access to research data can be of access without prior knowledge. Therefore, great value for researchers, citizens and businesses my biggest concern is that it will be used for in the form of new knowledge and discoveries. In something that it is not suited for.” creased access to the underlying data can moreover – survey respondent contribute to efficiency of the research by reuse of data. It can also increase cross-disciplinary research.” “ The cost of annotating data so they can actually be used – as opposed to just do data-dumps. This What does “data” mean in this context? Which data is costly and currently the funding of this comes are relevant to share? Who owns the data? Why would out of research grants.” people voluntarily share their arduously collected and – survey respondent highly valuable raw material? And who should cover the expenses for maintaining the large data banks? Best practice: listen to the researchers Increased access to data should be pursued wisely. When the foundation asked survey respondents what Data need to be of high quality, quality assured, and the DNRF could do to help researchers make more preferably annotated to optimize reuse. We can opti data available, the answers revealed that the issue mize excellent research with increased access to is far from simple: high-quality research data by learning from the people who know what works and where the challenges and “ It depends upon the type of data. There is no barriers lie: the researchers and the data managers, one-size-fits-all here.” the hands-on people. – survey respondent
Professor Liselotte Højgaard, Chair of the Board of the DNRF
Professor Søren-Peter Olesen, Director of the DNRF
– IT’S NOT THAT SIMPLE 03 THE DNRF’S OPEN DATA SURVEY
The DNRF conducted a survey among researchers Motivational factors from all its current Centers of Excellence and among Motivation is a key parameter to develop a successful its Niels Bohr Professorships. Out of the 1,175 strategy for enabling increased access to high-quality researchers who were invited to participate, 474 open data. responded to the survey, a response rate of 40%. Among the Centers of Excellence leaders, the Generally, it is the DNRF’s experience that researchers response rate was 74%. The purpose of the survey are idealistic people who are concerned with making was to clarify how and how much the researchers their research beneficial to society as a whole. They are using open data and to illuminate the strengths voluntarily want to share their arduously collected data of open data and the barriers to its use. to maximize public research investments, increase research integrity, and support the generation of new On the opposite page, opportunities and challenges ideas and possible breakthroughs. are summarized. It is characteristic that the list of challenges is twice as long as the list of opportunities, ” We want the bowl of candy out in the open, although, at the same time, the DNRF researchers are but we don’t want people to steal from it.” generally very positive about sharing data. This report – Professor Bo Elberling, Center for Permafrost is not a thorough analysis, but a contribution to the (CENPERM) debate on open access to data with an emphasis on the voice of the researchers. However, researchers aim for quality and fairness. Those who collect the data should be credited and quality must be assured.
04 OPEN ACCESS TO DATA INCREASED ACCESS TO HIGH-QUALITY RESEARCH DATA
Opportunities Challenges
• Generate collaboration • How to finance database maintainance • Maximize resources • Lack of joint infrastructure and financing • Strengthen reputations, making it possible to across Denmark attract high-profile international researchers • Barriers across technology and departments • Offer consistency when everybody draws • Continuity in long-term maintenance conclusions from the same quality-tested data • Resources involved in making quality-tested • Allow for co-authorships data available • Increase research integrity • Resources involved in making sure that only • Promote efficiency by boosting the reuse of data quality-tested data are being made available • Leverage interdisciplinary research • Long-term storage of data: cutbacks at universities • Allow competition to be a positive driver have increased this problem • Resources involved in advertising that you have data available • Many institutions have no permanent set-up for depositing data sets • Belief that sharing data is bad for your competitive edge • The balance between the pressure to publish and get patents and wanting to make data available • The concern that data are made available often with vast delay • Data input available for 25 years, but with no user interface • Access to comparable data requires specialist knowledge, e.g., that of data managers • Data validity – can you trust them? • Concern that open data distort researchers’ CVs via publications: data providers get co-authorships on papers they have not actually worked on • Need to ensure that open data will not compromise ongoing but not-yet-completed projects • Barriers of ethical issues, sharing of private/ personal data and intellectual property rights (IPR), especially challenges during collaborations with industrial partners and universities in terms of IPR
– IT’S NOT THAT SIMPLE 05 WHAT IS “DATA”?
The DNRF supports working toward increased access From what the DNRF has learned about the challenges to data, but always with a view to a defined and obtain with existing scientific data infrastructures in Denmark, able value of the effort. there is a great deal of work to be done to fully reap the advantages of this pan-European initiative. Which data are relevant to share and how should they be shared? Who owns the data? “ If I run a Northern blot and thereby obtain data “ I am very interested in trying to use open data in on the expression of a few genes under a specific future projects. I am, however, concerned that this growth condition for a specific strain of bacteria, is format will continue a trend of depriving research that the kind of data that should be made available? ers of ownership of their work. It would not be right Or is it only large data collection efforts that I for employers to profit indefinitely from the work should be sharing?” (data) of short-term staff whose careers they do – survey respondent. not support.” – survey respondent. In working with strategies for increased access to high-quality research data, it is imperative to be com Those working toward increased access to data should pletely focused and specific about relevance. Billions keep in mind the bigger picture and the implications for of euros are being invested in open data, and open data researchers, e.g., career paths, issues of publication represent opportunities for vast earnings, too. Only practices, etc. The aim in all areas of this endeavor data of relevance and quality should be shared. must be to increase the opportunity to excel in research, to continuously make research better. It is a delicate FAIR research data – European Open balance to reassure the individual researchers who took Science Cloud (EOSC) the trouble to collect the data and at the same time The European Open Science Cloud is a pan-European strengthen open access for the benefit of all research initiative that “aims to give Europe a global lead in ers. This balance is dependent on the research area. scientific data infrastructures, to ensure that European scientists reap the full benefits of data-driven science. The DNRF agrees with this conclusion from Policy […] The European Open Science Cloud will start by Recommendations for Open Access to Research Data federating existing scientific data infrastructures, to (www.recodeproject.eu): day scattered across disciplines and Member States.” (Communication from the Commission to the European “The development of open access to research data Parliament, the Council, the European Economic and needs to be informed by the research practices and Social Committee and the Committee of the Regions. processes in the different disciplines and characterized European Cloud Initiative - Building a competitive data by a partnership approach among key stakeholders. and knowledge economy in Europe, p. 6) This will help ensure the engagement from the wide range of research communities and the embedding Participation in the EOSC requires FAIR (Findable, of open access within research practice and process.” Accessible, Interoperable and Reusable) data, i.e., that data and meta data are machine readable, meaning Thus, the following pages contain five examples of that computers must be capable of accessing a data how working with open data can leverage research and publication autonomously, unaided by human operators. what the challenges are in five different research areas.
06 OPEN ACCESS TO DATA OPEN ACCESS TO RESEARCH DATA AND DATA MANAGEMENT STRATEGIES IN DENMARK
In Denmark, a number of organizations have devel The Ministry of Higher Education and Science has oped initiatives to make data available for research. not yet formulated an official strategy for open access In 2015, the National Strategy for Data Management, to data, and this can – as indicated in the National commissioned by the Danish Rector’s College, DeIC Strategy for Data Management – prove to be more (Danish e-Infrastructure Cooperation) and Deff complex than, e.g., the development of the open access (Denmark’s Electronic Research Library), was pub policy from 2012: “The significant differences between lished. The vision was to ensure Denmark a better the subject area’s conditions and challenges in the area and more competitive research environment through of data will probably entail that any uniform and cross- the efficient collection, securing, dissemination, and disciplinary policy would be rather generic, and in re-use of relevant research data. any case it will probably be supplemented with more specific policies for the individual subject areas” The Ministry of Higher Education and Science’s web (National Strategy for Data Management, p. 11). site provides an overview of national organizations: The Ministry of Higher Education and Science supports • The Danish Data Catalogue established by the the European Union Council’s conclusions from 2016 Danish Agency for Digitisation provides an overview on “The transition towards an open science system” of and access to public data. and the Commission’s plans to establish the European • The Danish Data Archive (DDA) is part of the Danish Open Science Cloud (EOSC). Following this, the Danish National Archives and makes research data based Agency for Science and Higher Education is currently on questionnaires accessible to researchers and preparing an analysis of the costs and benefits and students. Grants from the Independent Research barriers and opportunities of implementing FAIR Fund Denmark to research projects within the research data in Denmark. health sciences and the social sciences usually have a general obligation to deliver research data This analysis is dedicated to the future scenario to the Danish Data Archive. of machine readable data and meta data. While this • Statens Serum Institut (SSI) is a public enterprise scenario will probably rarely be fully realized, there under the Danish Ministry of Health. SSI gathers are benefits to be reaped in pursuing it. and disseminates data about the population’s state of health and data regarding activity, economy and quality in the Danish health service. The gathered data are made accessible to researchers, but researchers must pay for access. • Statistics Denmark contains an extensive collection of register data, with data collected from the 1970s to the present. Through the Division of Research Services at Statistics Denmark, authorized research institutions can gain access to data needed to solve specific research and analytical tasks, but research ers must pay for access.
– IT’S NOT THAT SIMPLE 07 OPEN DATA – BRIDGING RESEARCH, MONITORING AND EDUCATION
The Center for Permafrost (CENPERM) is working Most of the processes that CENPERM is studying in with soil-plant and microbial interactions and the Greenland are relevant to upscale in space and time. associated ecosystem feedback processes to climate Since better site-specific data can be extrapolated changes across sites in Greenland. One reason for and made relevant for understanding ecosystem focusing on Greenland was the CENPERM-affiliated researchers’ previous research in Greenland but also their involvement in ecosystem monitoring since “To harvest the value of such a large international 1996. These monitoring data have been important study it is crucial that comparisons can be made for the science that has been completed so far in using similar set-ups, methods, and data format, CENPERM. Similarly, it is important for CENPERM and that the data are openly accessible.” to ensure that its data are available to the public in a useful and quality-tested format. – Professor Bo Elberling, CoE leader, CENPERM feedback mechanisms on a larger scale, more exciting Open data require resources, and funding has been results may be obtained, and a higher impact may be allocated for that in CENPERM. The center is using achieved when publishing. An example: If the center a format that has been developed for PROMICE quantifies an increasing release of carbon dioxide to (Programme for Monitoring of the Greenland Ice the atmosphere in a certain vegetation type in West Sheet) and used by others such as GEM (Greenland Greenland due to artificial warming, a global network Ecosystem Monitoring). Both programs provide impor using similar warming for 50 sites across the Arctic tant data sources and data deposits for CENPERM. is a key to understanding the variability between sites Making data open is also important for teaching. and scaling to the entire Arctic region. To harvest the Several theses based on open data have already been value of such a large international study, it is crucial completed at CENPERM, which has resulted in inter that comparisons can be made using similar set-ups, national collaboration that would not otherwise have methods, and data format, and that the data are been initiated. CENPERM expects to see its data used openly accessible. and tested, and the center welcomes future challeng es to its conclusions.
A B Monitoring data Research data Parameter
Time
Upscaling in space and with time is important for CENPERM. Open data is a Upscalingkey component in space in and order with to timeupscale. is important A: Beyond for Greenland CENPERM. we are Open collaborating data is a key componentwith institutions in order responsibleto upscale. for A: moreBeyond than Greenland 20 other arctic we are sites. collaborating Some of the with institutions responsiblework is made for more in a very than consistent 20 other arcticway, as sites. for open Some top of chambers the work for is artificial made in a very consistentwarming, way, known as for as openITEX chambers.top chambers B: CENPERM for artificial data warming, collection known started as in ITEX chambers. B: CENPERM2012, but assessingdata collection long term started climate in 2012, responses but assessing requires access long termto long climate responses requirestime accessseries as to made long available time series by DMI as madeand ASIAQ available in Greenland. by DMI and ASIAQ in Greenland.
– IT’S NOT THAT SIMPLE 09 ICOURTS’ DATABASE ON INTERNATIONAL COURT DECISIONS
Over the past several years, iCourts has developed manually collected all decisions and then entered the largest currently existing database of decisions them in the database in Copenhagen. of international courts. The center’s approach to data-gathering in this regard has generally been the An interesting feature of the database is that it organ following. First, it has obtained data from a set of izes and digitizes content from courts, even when the specific international court websites/databases, courts themselves are not doing it. For example, the which the center then organized in its own database. In many cases, the center has extracted additional data. The sources used for this purpose are the “The center’s ambition is to make this broader publicly available databases of international courts, database available and to provide a set of digital such as HUDOC (http://hudoc.echr.coe.int), EUR-Lex analysis tools for its future users, through the ( ), ICTY ( http://eur-lex.europa.eu http://www.icty.org/ sequence query-explore-download.” action/cases/), IACHR (http://www.corteidh.or.cr/ index.php/en), etc. In some instances, the center has – Professor Mikael Rask Madsen, CoE leader, iCourt
10 OPEN ACCESS TO DATA: Photo: Ditte Valente/EliteForsk iCourts’ database allows searches in the full text of The center’s ambition is to make this broader data rulings, even in cases where the corresponding court base available and to provide a set of digital analysis does not offer this functionality, for example, the In tools for its future users, through the sequence ternational Criminal Tribunal for the Former Yugoslavia query-explore-download. Indicatively, these tools or the Inter-American Court of Human Rights. This are: a) searches, as described above, b) self-service feature is of great importance to both the legal audi descriptive statistics, where users will be able to ence and the greater public, since it facilitates access query the database, create their own charts, and save to legal sources otherwise not available. them to their computers, and c) case-law networks, where users will be able to query the database, Typical problems that the center has faced in these explore the corresponding case-to-case network, endeavors are data inconsistencies and data entry and save it for their own use. mistakes, or data not being available in a format that can be processed by machines. Complicating the task further is the fact that the courts’ raw data are often organized in complex ways specific to the court in question or that each researcher has very specific interests. Finally, the center has faced the well-known challenge of continuously updating the datasets. This is particularly the case with regard to data that are manually coded on the basis of a qualitative reading of the cases.
The center has generally made the database available to iCourts’ researchers or affiliates through the plat form at www.icourts.dk or in specific forms requested by the researchers. The database is used both inter nally at iCourts and by iCourts’ collaborators at, for instance, the Technical University of Denmark (DTU), the University of Copenhagen (KU), the European Uni versity Institute (EUI), Duke University, Northwestern University, and France’s National Center for Scientific Research (CNRS). A fraction of the available datasets is currently made publicly available through icourts.dk. Importantly, iCourts’ goal is not to replicate a data repository like Harvard’s Dataverse, but rather to have a more interactive distribution of the datasets, whereby users are able to navigate and search through the data. Apart from the datasets, the center has made available a number of networks of citations to precedents,
related to papers published by iCourts’ researchers. Photo: Carolina Utoft
– IT’S NOT THAT SIMPLE 11 OPEN DATA – URBNET’S CHALLENGES AND POSSIBILITIES
The Centre for Urban Network Evolutions (UrbNet) with data from the natural sciences, which are is an interdisciplinary center with a strong base in the considered open data, and the center needs to com humanities. While open data and its challenges and bine such data with data from the humanities, which possibilities have been topics of discussion in the nat are not necessarily considered open data. Rather, ural sciences for quite a while, such discussions have these data are often selected datasets and often only more recently become central to the humanities and begun to impact the way projects are designed and research structured. At an early point in its exist “It has been possible to embed the center’s own ence, UrbNet has had to engage with the possibilities larger datasets into a more holistic regional and challenges posed by open data because the picture, which in turn heightens the quality center works across disciplinary borders and engag of the research UrbNet is able to produce.” es with high definition methods. It has had to grapple with the fact that, in some cases, the center works – Professor Rubina Raja, CoE leader, UrbNet. German Jerash Northwest Quarter Project -
12 OPEN ACCESS TO DATA: Photo: The Danish only partial datasets. One core challenge in such Furthermore, through mining datasets from other cases is how to compare unequal datasets across published projects and encouraging colleagues disciplines and make meaningful observations. working in related fields to participate and publish in volumes edited by UrbNet and related projects, it has The fact that open data are always data selected and also been possible to embed the center’s own larger presented by the publisher/author(s)/generator of the datasets into a more holistic regional picture, which data needs to be underlined. Therefore, open data do in turn heightens the quality of the research UrbNet not necessarily represent objective or “raw” data, es is able to produce. Such projects call for resources pecially not within the humanities. Interpretation may involving a large number of man-hours for manual already have been embedded, even unconsciously, work on databases and data interpretation in general. through the description of data presented in an open Therefore, such open data projects present challenges dataset within the humanities. For example, this is within the humanities where many projects do not the case with datasets of ceramics (one of the core have such resources available. material groups for establishing relative chronologies through typological developments within archaeology), where dating is relative and based exclusively on the excavator’s stratigraphy and his or her interpretation of this (if not combined with high precision dating methods). Another challenge with open data within archaeology and the empirical material stemming from archaeological excavations is presented by the sheer amount of material and the resources for processing that it takes to prepare the raw data for presentation. This does not lessen the importance of making data more broadly available. However, it does present challenges to the ways in which such data can be and are made available.
UrbNet has ongoing discussions about best practices within the field of open data, both as users and pro viders, within the humanities. Within the center, the researchers, among other projects, are working with the Carlsberg Foundation, which has funded a collec tive research project called “Ceramics in Context,” on developing a best practice scenario drawn from an ongoing fieldwork project in Jerash, Jordan. This project provides a full quantification approach, making Locally produced pottery from Gerasa/Jerash, Jordan. Every all data from a six-year excavation project available excavated sherd, more than a million, has been registered over five years of excavation and will be made available together online and in print. with the final publication. The Danish-German Jerash North west Quarter Project.
– IT’S NOT THAT SIMPLE 13 OPEN DATA AT THE STELLAR ASTROPHYSICIS CENTRE
The Stellar Astrophysics Centre’s (SAC) open data sharing of derived data products through custom- policies are becoming the norm within astrophysics, made websites. The picture on the right shows the supported by the key role played by SAC in making constellations Cygnus, Lyra and Aquila with a marking data available to the asteroseismic community. of the field observed by Kepler.
Hosting data for the international community Not only is SAC a heavy user of open data, it is also “The Stellar Astrophysics Centre’s (SAC) open data supplying data and facilitating the open sharing of policies are becoming the norm within astrophysics, data between international researchers. Beginning supported by the key role played by SAC in making with NASA’s Kepler mission, SAC set up the Kepler data available to the asteroseismic community.” Asteroseismic Science Operations Centre, where one of the prime objectives is distributing the original – Professor Jørgen Christensen-Dalsgaard, scientific data from the mission and facilitating the CoE leader, SAC.
14 OPEN ACCESS TO DATA: This work has continued with the Stellar Observations Long-term storage of scientific data Network Group and Transiting Exoplanet Survey In the context of the “Data Management in Practice” Satellite projects, where the centre is also creating project, SAC researchers have been deeply involved data archives to be shared with the community. In all in a project with the Danish Royal Library to investi cases, an interested user needs only to register as a gate how scientific data can be stored and archived user (which is free for all) and can download data for long-term preservation (>50 years) while still being within seconds. useful for active scientific research. An inter-depart mental collaboration has led to a prototype archive In addition to providing reliable and efficient data ac that could potentially be used for long-term archiving cess through the Kepler web interface, SAC research of digital scientific data. The analysis was published ers have also implemented an additional interface for in a joint paper (under the Digital Curation Centre). the Seismic Plus Portal, developed as part of the The political aspects of who would be responsible for EU-funded SpaceInn project, which uses the Virtual paying for the maintenance and running costs of such Observatory (VO) format, an open standard of sharing an archive are, however, still unclear. astronomical datasets. Meta data are made available through the VO Table Access Protocol using a dedi cated VO server at Aarhus University.
Access to data via general astronomical archive facilities SAC is using data from a series of available archives. The most used is the European Southern Observatory Archive, containing all raw and calibrated data from these telescopes, available via http://archive.eso.org/ cms/eso-data.html. Particularly important have been data from the Very Large Telescope and the La Silla facilities. The centre also uses archive data from the Nordic Optical Telescope via http://www.not.iac.es/ archive/, especially the FIES spectrograph. Finally, SAC uses data from NASA facilities (https://archive. stsci.edu/), which provide access to data from the Hubble Space Telescope, Kepler, and other NASA/ ESA space missions.
An image by Carter Roberts of the Eastbay Astronomical Society in Oakland, CA, showing the Milky Way region of the sky where the Kepler spacecraft/photometer will be pointing. Each rectangle indicates the specific region of the sky covered by each CCD element of the Kepler photometer. There are a total of 42 CCD elements in pairs, each pair comprising a square. Credit: Carter Roberts / Eastbay Astronomical Society.
– IT’S NOT THAT SIMPLE 15 CENTER FOR PERSONALIZED MEDICINE IN IMMUNE DEFICIENCY (PERSIMUNE)
Health data: How to ensure access without health. As a result, Denmark is in a strong position to compromising privacy productively contribute to the development of person Given the unique person identifier number, it is possi alized care, the new paradigm in modern medicine. ble to combine legacy and contemporary data gener ated by the health system as part of routine care ac tivities for the entire Danish population. Selection of “The creation of the data warehouse with analyzable the population is nil and the completeness of data is data elements is the result of a collective and exten high. These data can be combined with more complex sive combined effort by medical and programming data using modern -omics technologies to character experts over several years; maintenance of optimal ize the host genome and that of invading microorgan data quality requires continued investment in data isms, the profile of immune cells, and the circulating sources, since their quality and electronic formats proteins and metabolites, in order to create a systemic are continuously changing.” biological characterization of the host. In doing so, researchers are able to identify novel pathways for – Professor Jens Lundgren, CoE leader, PERSIMUNE.
16 OPEN ACCESS TO DATA: The combined dataset – stored in a data warehouse for relevant and legitimate interested groups. By – with access to comprehensive computer power preserving ownership and the continued involvement can be used for research purposes as well as for of those responsible for generating the data (clinicians real-time support for clinical decision-making as and researchers), PERSIMUNE has made it possible part of routine care. to export data for transcending research projects using a defined governance structure. The creation There are strict regulations for gaining access to data of the data warehouse with analyzable data elements within the data warehouse. These rules help to ensure is the result of a collective and extensive combined that an individual’s expectations that her/his health effort by medical and programming experts over data are kept secret and are accessible only to relevant several years; maintenance of optimal data quality health-care professionals involved in her/his care are requires continued investment in data sources, since met. For research purposes, data can be partly anony their quality and electronic formats are continuously mized, but they should still be kept in a secure and changing. Therefore, this effort is best centralized as controlled space governed by the institution respon Denmark engages in unfolding a national vision for sible for the data. implementing personalized medicine.
Hence, open access to health data is not possible. Rather, the institution responsible for the data warehouse must establish a governance structure that ensures access while not compromising privacy.
These considerations have been resolved for the formation and operations of the PERSIMUNE data warehouse at the Rigshospitalet. PERSIMUNE is able to collate data from a widely diverse set of data sources while still ensuring accessibility to the data
– IT’S NOT THAT SIMPLE 17 THE DNRF OPEN DATA SURVEY
The DNRF conducted a survey among researchers from all of its current Centers of Excellence and among its Niels Bohr Professorships. Out of the 1,175 researchers who were invited to participate, 474 responded to the survey, a response rate of 40%. All academic positions are represented, with PhD students and post-docs accounting for 61% of the respondents. Approximately 60% of the respondents have Denmark as their home country. The purpose of the survey was to clarify how and how much the researchers are using open data and to illuminate the strengths of open data and the barriers to its use.
18 OPEN ACCESS TO DATA:
FIGURE 1 Distribution in percent DISTRIBUTION OF SCIENTIFIC AREAS AMONG RESPONDENTS All areas are represented. Approximately 50% have backgrounds in the natural sciences and 30% in the life sciences. H S N L T S S S S
FIGURE 2 DISTRIBUTION OF OPENNESS OF DATA OUTSIDE OF RESEARCHERS’ OWN RESEARCH GROUP ACCORDING TO SCIENTIFIC AREA
Distribution in percent Humanities Social Sciences Natural Sciences Life Sciences Technical Sciences It is most common to make data available within the Natural Sciences and the Humanities. N O
FIGURE 3 DISTRIBUTION OF OPENNESS OF DATA OUTSIDE OF RESEARCHERS’ OWN RESEARCH GROUP ACCORDING TO ACADEMIC POSITION Distribution in percent PhD student Postdoc or assistant professor Associate professor Professor MSO Professor Professors are most likely and PhD students are less likely to make data available. N O
– IT’S NOT THAT SIMPLE 19
FIGURE 4 NUMBER OF DATA SETS MADE AVAILABLE BY THE RESEARCHERS AMONG THOSE WHO MAKE THEIR DATA AVAILABLE
0 1-5 6-20 20+