Out of Cite, out of Mind: the Current State of Practice, Policy, and Technology for the Citation of Data

Data Science Journal, Volume 12, 13 September 2013 OUT OF CITE, OUT OF MIND: THE CURRENT STATE OF PRACTICE, POLICY, AND TECHNOLOGY FOR THE CITATION OF DATA CODATA-ICSTI Task Group on Data Citation Standards and Practices Edited by Yvonne M. Socha Data Science Journal, Volume 12, 13 September 2013 OUT OF CITE, OUT OF MIND: THE CURRENT STATE OF PRACTICE, POLICY, AND TECHNOLOGY FOR THE CITATION OF DATA CODATA-ICSTI Task Group on Data Citation Standards and Practices Edited by Yvonne M. Socha PREFACE The growth in the capacity of the research community to collect and distribute data presents huge opportunities. It is already transforming old methods of scientific research and permitting the creation of new ones. However, the exploitation of these opportunities depends upon more than computing power, storage, and network connectivity. Among the promises of our growing universe of online digital data are the ability to integrate data into new forms of scholarly publishing to allow peer-examination and review of conclusions or analysis of experimental and observational data and the ability for subsequent researchers to make new analyses of the same data, including their combination with other data sets and uses that may have been unanticipated by the original producer or collector. The use of published digital data, like the use of digitally published literature, depends upon the ability to identify, authenticate, locate, access, and interpret them. Data citations provide necessary support for these functions, as well as other functions such as attribution of credit and establishment of provenance. References to data, however, present challenges not encountered in references to literature. For example, how can one specify a particular subset of data in the absence of familiar conventions such as page numbers or chapters? The traditions and good practices for maintaining the scholarly record by proper references to a work are well established and understood in regard to journal articles and other literature, but attributing credit by bibliographic references to data are not yet so broadly implemented. Recognizing the needs for better data referencing and citation practices and investing effort to address those needs has come at different rates in different fields and disciplines. As competing conventions and practices emerge in separate communities, inconsistencies and incompatibilities can interfere with promoting the sharing and use of research data. In order to reconcile this problem, sharing experiences across communities may be necessary, or at least helpful, to achieving the full potential of published data. Practical and consistent data citation standards and practices are thus important for providing the incentives, recognition, and rewards that foster scientific progress. New requirements from funding agencies to develop data management plans emphasize the need to develop standards and data citation practices. The CODATA-ICSTI Task Group on Data Citation Standards and Practices was first organized in 2010 jointly by the international and interdisciplinary Committee on Data for Science and Technology (CODATA) and the International Council for Scientific and Technical Information (ICSTI). Both CODATA and ICSTI adhere to the International Council for Science (ICSU), a nongovernmental umbrella scientific organization headquartered in Paris, France. Additional information about all three groups is available at www.codata.org, www.icsti.org, and www.icsu.org, respectively. Together with representatives from several other organizations, the CODATA-ICSTI Task Group examines a number of key issues related to data identification, attribution, citation, and linking. Additionally, the Task Group helps coordinate international activities in this area and promotes common practices and standards in the scientific community. This report is part of that focused effort. To address these challenges, the Task Group’s first major activity was to collaborate with the U.S. National Academy of Sciences’ Board on Research Data and Information (BRDI) and the U.S. National Committee for CODATA on an international workshop held in August 2011, in Berkeley, California (http://sites.nationalacademies.org/PGA/brdi/PGA_064019). The workshop culminated in the report, For Attribution—Developing Data Attribution and Citation Practices and Standards, National Academies Press, 2012 (available openly and freely for download at http://www.nap.edu/catalog.php?record_id=13564). Since the 2011 workshop, the Task Group has undertaken a series of activities designed to build upon the international body of knowledge on data citation and attribution practices. The report presented here represents the next step identified by the Task Group: to document the current state of practice for data citation and attribution, noting emerging trends, successes, and challenges. 1 Data Science Journal, Volume 12, 13 September 2013 The principal methods the Task Group used in writing this paper included the following: • Literature Search and Compilation of Bibliography From its inception, the Task Group assembled a bibliography on the topic of data citation and attribution. This activity continued throughout the completion of this report. We drew upon references provided by speakers and participants at the workshop, conducted library and web searches, monitored listservs and blogs of organizations working in the field of data publication and related topics, and received many submissions from Task Group members and their colleagues. The resulting bibliography is posted online and explained in Appendix B of this report. • Stakeholder Interviews Recognizing that different stakeholder communities might have different interests or concerns regarding data citation and attribution practices, the Task Group identified those communities likely to have the greatest potential impact upon the development of citation and attribution practices: managers at data repositories and academic libraries, scholarly journals, research institutions, and research funding organizations. While individual researchers were also identified as an important stakeholder community, in the interest of efficiency, the Task Group chose to focus its primary attention upon the institutional stakeholders with whom individual researchers would necessarily interact. Members of the Task Group then conducted telephone interviews with representatives of those stakeholder communities in which the selected representatives were asked questions tailored specifically to each community. The interviews made no effort to achieve statistical validity but rather were designed to support the Task Group’s effort to assess the progress of those communities in their efforts to recognize and address issues regarding data citation as well as their perceptions of its importance. The list of interviewees is presented in Appendix C. • The Writing Process Task Group members developed an outline based upon discussions conducted primarily by email and teleconference. The members then volunteered to focus on certain topics based upon their respective interests and expertise. In addition to monthly teleconferences, the writing teams met in person several times for drafting sessions to elaborate upon and refine the chapters. Two-day writing sessions were held in Copenhagen in June of 2012 and in Taipei in October of 2012. The Task Group engaged the services of a technical writer with expertise in Library and Information Science who had previously worked on compiling the bibliography to refine the output of the various chapter teams into a document in a more consistent voice. The technical writer also met with several members of the writing team in November of 2012 to further refine the draft based upon inputs developed at the Taipei writing sessions. The Task Group continued to circulate drafts of the revised chapters to the writing teams working on other chapters and to all the Task Group members for internal review and comment. Finally, the Task Group identified external peer reviewers with the appropriate expertise to critique the paper. The writing team then responded to reviewer comments and made appropriate revisions to the manuscript prior to publication. The Acknowledgement section that follows the body of the report contains the names of all the people who were involved in the production of this report, including the funders. Keywords: Data citation, Data management, Data policy, Data publishing, Data centers, Data access, Reuse of data, Metadata, Digital preservation, Information standards, Information infrastructure, Information technologies, Internet, Libraries, Scientific organizations, STM publishers, Research funders, Information metrics 2 Data Science Journal, Volume 12, 13 September 2013 Table of Contents Executive Summary ....................................................................................................................................................... 6 Chapter 1 THE IMPORTANCE OF DATA CITATION TO THE RESEARCH ENTERPRISE .............................. 8 1.1 Introduction ...................................................................................................................................................... 8 1.2 The role of data in the research lifecycle .......................................................................................................... 9 1.3 Organization of this report ............................................................................................................................

Out of Cite, out of Mind: the Current State of Practice, Policy, and Technology for the Citation of Data

A Question Answering System for Chemistry

Discussion #1 | University of Texas at Austin | August 13, 2018 Facilitated By: Itza A

Constructing Reference Semantic Predictions from Biomedical Knowledge Sources

Distributed Semantic Sensor Networks How to Use Semantics and Knowledge Distribution to Integrate Sensor Data of Disparate Data Sources

Applying the Semantic Web to Computational Chemistry

Relext: Relation Extraction Using Deep Learning Approaches for Cybersecurity Knowledge Graph Improvement

An Approach for Knowledge Graph Construction from Spanish Texts

Ontology Middleware for Integration of Iot Healthcare Information Systems in EHR Systems

Heron Visualisation Engine. Visualisation and Dissemination of Semantic Cultural Heritage Data

Exploration of Large-Scale SPARQL Query Collections: Finding Structure and Regularity for Optimizing Database Systems

Natural Language Questions for the Web of Data

Semantics-Preserving RDB2RDF Data Transformation Using Hierarchical Direct Mapping