How and Why Data Repositories Are Changing Academia Phill Jones Digital Science, [email protected]
Total Page:16
File Type:pdf, Size:1020Kb
Against the Grain Volume 28 | Issue 1 Article 11 2016 How and Why Data Repositories are Changing Academia Phill Jones Digital Science, [email protected] Mark Hahnel Figshare, [email protected] Follow this and additional works at: https://docs.lib.purdue.edu/atg Part of the Library and Information Science Commons Recommended Citation Jones, Phill and Hahnel, Mark (2018) "How and Why Data Repositories are Changing Academia," Against the Grain: Vol. 28: Iss. 1, Article 11. DOI: https://doi.org/10.7771/2380-176X.7269 This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. Please contact [email protected] for additional information. How and Why Data Repositories are Changing Academia by Phill Jones (Head of Publisher Outreach, Digital Science) <[email protected]> and Mark Hahnel (Founder, Figshare) <[email protected]> cademic and scholarly communication instance, Rosie Redfield of the University of programs based on machine suggested relation- is unquestionably in the process of British Columbia documented her attempts to ships. Immediately, this provides many more Aundergoing a revolution. It seems, replicate NASA’s claims of discovering arsenic promising avenues to explore across all fields however, that the nature of that revolution is based life on her blog ahead of publishing them of research in a practice that pharmaceutical still a somewhat open question. Libraries in in AAAS Science, which debunked the claim. companies have been exploiting with compu- particular are undergoing not so much a shift However, this sort of blogging/publishing gen- tational chemistry for decades. in focus but a diversification of roles. Where erally acts as a more rapid media for hypothesis the library once consisted primarily of a phys- driven scientific narratives, similar in concept Barriers to Sharing ical building containing curated collections of to traditional articles, rather than a way to make The reasons why many researchers choose books, journals and other resources, it is now data sets available. not to share their data, or share it only upon a diverse set of services ranging from research For many people interested in data pub- request through closed systems like email, is assessment to technology support to the new lishing, what’s required is a new infrastructure less well explored than the benefits mentioned frontier of data curation and dissemination. for communicating data and other research above. Last year, a survey of Wiley authors, outputs that is separate from hypothesis driven which was reported on in the Scholarly Kitchen Why Should Librarians Care by Alice Meadows, found that just less than narratives and judged on its own terms. The 2 about Data Sharing? features of this infrastructure are not entirely half of researchers choose not to share data. The role of the library as manager of col- clear but we do know that it must be able to Wiley produced a survey infographic, which lections of information for the use of patrons cope with large quantities of data. Some data is linked from the Scholarly Kitchen article, is still alive and well. Increasingly, however, will be in well-codified and well-documented which contains a long list of reasons as to libraries have been concerned with recording formats, but much of it won’t be. Data needs why some researchers are reluctant to share. and curating the output of their institutions. to be discoverable and at least somewhat Broadly, there seems to be three overarching This expansion of role has on some level been interpretable, so that it is available for re-use themes. The first issue is a fear that sharing driven by a shift in the way that scholars are and re-analysis when needed. Finally, there’s data would have negative consequences either communicating their work and accounting for a need to protect a researcher’s ability to fully because another researcher appropriates data its value. Arguably, this trend began around 15 analyse their own data first through embargos and scoops the original experimenter, or their years ago with the rise of open access publish- and also to protect commercially or medically work gets picked apart and unfairly discredit- ing, which itself was made possible by the shift sensitive information. ed. The appropriate use of embargoes should to more scalable electronic journals. Many mitigate many of those concerns. The second libraries at the time took an interest in the new Taking all this together, data publishing issue is lack of researcher understanding of publishing model by either setting up central seems to be a fairly complicated issue, but one how to share data. Answers like “My funder/ funds for the payment of article processing that the library is well-placed to tackle. institution does not require data sharing,” or “I charges or supporting and educating scholars Why Researchers Care don’t think it was my responsibility” aren’t evi- in how and why to publish open access. Later, dence of a positive decision not to share, rather institutional repositories provided avenues for There are a number of potential advantages that some researchers are still not yet seriously green open access and library publishing oper- to scholars of sharing their data. Probably considering it. It’s easy to see how librarians the most compelling reason is the apparent ations began to develop during the first decade 1 and information professionals can help with of the 2000s, culminating in the creation of the citation advantage. Other reasons include that one. Finally, many of the responses speak Library Publishing Coalition requirements from funders, jour- to a lack of time and resources. This last issue in 2012. Many library publishing nals and institutions, as well as a is perhaps the toughest to tackle, so let’s look operations, in contrast with tradi- personal desire to make science at it in more depth. more open. tional university presses, aim to Researchers are often juggling many dispa- support niche areas of scholar- Many researchers believe that rate and seemingly unconnected responsibili- ship of interest to their own fac- open data is necessary to make ties, from research to managing their labs and ulty. However, early suggestions scholarship more effective. The getting grants, to teaching, to university admin- that institutional open access academic system does work, but istrative tasks and committees. With such a di- paper repositories may replace it can be an inefficient machine. verse workload, with so many responsibilities the role of traditional publishers The majority of inefficiencies lie to juggle, it can be challenging to incorporate have proven to be a bridge too in the inability for academics to new workflows. For this reason, simplicity and far. One can postulate many directly build on the research that intuitive workflows are increasingly important. reasons for this, but publisher has gone before them — to better You only have to look at the rising pressure that brands and the need to publish in stand on the shoulders of giants. publishers are under to simplify their submis- high impact factor journals seem Increased transparency can also sion systems and eliminate author burden, or at the most likely. This is not the case for the improve academia’s ability to self-correct the success of simplified search likeG oogle to emerging requirements of data dissemination. through openness to scrutiny and challenge. see that researchers often value simplicity and There are as yet no impact factors or prestige Making data sharable and open has the add- intuitiveness over comprehensive functionality. publication outputs. This means that libraries ed benefit of encouraging standards and codifi- Against that background, it’s not surprising may have another opportunity to play a key cation — a vital step to making data machine that many researchers are choosing to share role in communicating the academic content readable. The power of computers means that data using supplementary materials services that comes out of their institutions. data can be interrogated and cross referenced offered by publishers despite the fact that in As the open science movement has grown in order to automatically look for correlations many cases those systems were not designed in momentum over the past decade and a half, between research outputs. Of course, today’s with data sharing in mind.3 If data sharing is to scholars have sought new outlets for new artificial intelligence won’t enable computers become the norm, it will be important to create types of scientific output. The blogosphere to generate and confirm hypotheses the way a systems that are not only robust and scalable, has been used to “publish” work almost in real person can, hence the need for academics with but also very simple and time effective to use. time, resulting in some noteworthy cases. For subject specific knowledge to build research continued on page 24 22 Against the Grain / February 2016 <http://www.against-the-grain.com> Scholarly Publishers and Data industrial scale efforts to assemble super-data- How and Why Data Repositories ... Over the past decade, some traditional pub- sets like Zooniverse’s Galaxy Zoo (http:// from page 22 lishers have worked with repositories to link data.galaxyzoo.org/) and the NIH’s GenBank. raw digitised objects that underlie research to There are a number of libraries and other Data as a First Class Research Object the hypothesis-driven narrative of the article. groups that maintain lists of these types The idea that datasets should be treated as The goal is to standardize the approach to link- of databases, perhaps most notable are the an equal output to academic articles is a contro- ing research data to publications, irrespective Registry of Research Databases (www. versial one, but one that funders and advisory of the repository, which hosts the data.