Volume 4 Issue 3

©2016 DataONE 1312 Basehart SE University of New Mexico Albuquerque NM 87106 DataONE and USGS: non-USGS entity, including scientific Making Open Data a Reality journals, professional society volumes, cooperating agency series, and university or commercial publishers.

“Exceptions are permitted only if the USGS agrees that a demonstrated circumstance restricts the data from public release, for example in rare cases where access must be restricted because of security, privacy, confidentiality, or other constraints.

“The plan responds to a February 2013 Office of Science and Technology Policy memorandum that directed federal agencies with annual research and development budgets above $100 million to increase public access to peer-reviewed scientific publications and digital data resulting from federally funded research. On January 8, OSTP approved the USGS plan.

“Specifically, this plan requires that an Figure 1: USGS Science Data Catalog electronic copy of either the accepted The U.S. Geological Survey (USGS), its current on-line gateways to provide free manuscript or the final publication of a major government contributor to the public access to scholarly research and record is available through the USGS DataONE project, is actively engaged in the supporting data produced in full or in part Publications Warehouse. Digital data will open science data movement. The USGS with USGS funding, no matter how it is be available in machine readable form believes on principle that science data should published. from the USGS Science Data Catalog. be released and available, as evidenced by a The plan will require the inclusion of data history of making water, hazard, and other “The USGS plan ’Public Access to Results management plans in all new research data available in a timely manner since its of Federally Funded Research at the U.S. proposals and grants. inception. The USGS is striving now to ensure Geological Survey: Scholarly Publications that all of its science data that supports and Digital Data,’ stipulates that, beginning “Much of the plan refers to requirements scholarly publications is available in the October 1, the USGS will require that or activities that already exist or are being public forum as the publication is released. any research it funds be released from implemented. The mandate to publish data The following press release, “USGS Increases the publisher and available free to the and findings from USGS science activities Public Access to Scientific Research,” public no later than 12 months after initial dates to the Bureau’s creation by the published February 8, 2016, demonstrates publication. The USGS will also require signing of the Sundry Civil Bill on March the USGS commitment to open data, and that data used to support the findings 3, 1879, establishing the USGS. This bill its leadership’s determination to make be available free to the public when the also defined the requirement to report the government open science data a reality: associated study is published. results of investigations by the USGS to the public. “The U.S. Geological Survey is “The plan applies to research papers and implementing new measures that will data authored or co-authored by USGS, “The results of USGS research, generally improve public access to USGS-funded contract employees, award or grant released in the form of publications, science as detailed in its new public access recipients, partners and other entities. maps, data, and models, are used by plan. The plan enables the USGS to expand It includes materials published by any cont’d page 2 ›››  Spring 2016

CoverSTORY cont’d sources are provided.”1 The USGS has long contributed thousands policymakers at all levels of government of metadata records to DataONE’s open and by the private sector to support To begin strategically developing a science platform. Additionally, there are appropriate decisions about how to roadmap to ensure the USGS meets the many examples of DataONE influence respond to natural hazards, manage requirements identified in the plan, the USGS in USGS’s approaches to open data. For natural resources, and to spur innovation recently held a meeting from March 1 to 3, example, DataONE’s leadership is critical in and economic growth. 2016, one of the first of its kind, that brought highlighting best practices for data through together USGS scientists leading policy educational modules, incorporating usability “This plan builds on existing USGS implementation, application development, analysis concepts in design, and policy, which requires public access be and process improvement for USGS science establishing a working group model that provided for any scholarly publications and data publishing and/or scholarly publications. brings together diverse thinking in an associated data that arise from research The meeting was a landmark for USGS in organization to solve complex problems. conducted directly by USGS or by others that it provided the necessary energy and USGS has adopted these approaches. One using USGS funding, is published by the commitment required at all levels of the specific example in the USGS that models the USGS or externally by USGS scientists or organization to make open science data DataONE approach to community networks of USGS funded scientists. This existing policy a success; this success must begin with people is the Community for requires that data must be made available USGS scientists understanding how to (CDI), an open community that employs at the time of publication to support properly release data that support scholarly working groups to further data and technical scholarly conclusions. conclusions. One significant result of the advances in the USGS. meeting was the initiation of three teams that The USGS is committed to the open data “USGS already has the portals it needs to will focus respectively on connectivity between concept as it sees it as fundamental to the implement public access. USGS scholarly systems, communication and training, and advancement of science. Making science publications and associated data are planning for the formalization/establishment data publicly accessible allows for more discoverable online. Currently, citations of designated repository solutions in the efficient and effective understanding of the for the more than 50,000 USGS series USGS. From the momentum of this meeting, Earth’s resources and processes. USGS is publications are available, and 10,000 of the USGS expects to meet the requirements partnering with DataONE to continue making these are also available free to the public of its public access plan by October 1, 2016, science data available, understandable, as downloadable digital files. In addition, as stipulated by the White House Office of and, ultimately, re-usable in new science more than 41,000 scholarly publications Science and Technology Policy (OSTP). endeavors. n authored by the USGS but published USGS and DataONE have been working — Viv Hutchison externally are cataloged in the Publications together since the inception of DataONE Mike Frame Warehouse, and links to original published to make science data open and available. USGS (Guest Editorial)

UpcomingEVENTS

Members of the DataONE Team will be at the following events. Full information on training activities can be found at bit.ly/D1Training and our calendar is available at bit.ly/D1Events. Jul. 17-18 DataONE Users Group Meeting Durham, NC https://www.dataone.org/dataone-users-group Next DataONE Webinar: Jul. 18-22 Tuesday April 12th Federation of Earth Science Information Partners Durham, NC http://commons.esipfed.org/2016SummerMeeting DataONE: Current Aug. 7-12 Ecological Society of America Ft Lauderdale, FL services, new tools and http://esa.org/ftlauderdale/

future developments

Sep. 11-16 Join us at 12 noon Eastern Time. RDA Denver, CO https://rd-alliance.org/plenary-meetings/next-plenary Free to attend, register now at: https://www.dataone.org/webinars

2  Spring 2016

MemberNodeDESCRIPTION

Each Member Node within the DataONE federation completes a description document summarizing the content, technical characteristics and policies of their resources. These documents can be found on the DataONE.org site at bit.ly/D1CMNs. In each newsletter issue we will highlight one of our current Member Nodes.

Minnesota Population Center (MPC) https://www.pop.umn.edu/

The Minnesota Population Center (MPC) is an interdisciplinary cooperative for demographic research at the University of Minnesota. In addition to the 100 faculty affiliates spread over 26 departments and nine colleges at the University, MPC also serves a broader audience of some 70,000 demographic researchers worldwide.

MPC disseminates integrated census and survey data from the U.S. and around the world. These resources describe the populations of more than 70 countries and are designed to be interoperable for spatiotemporal comparisons. MPC currently disseminates census microdata--data describing individuals and households - from 79 countries for the period 1960 to the present; for nine countries, census microdata from the nineteenth century are available. It disseminates integrated versions of four high-value demographic and health surveys from the United States. It also disseminates small-area summary data from the United States for places such as tracts and counties, and boundary files describing these places within the U.S. for the period 1790 to present. MPC also disseminates additional small-area summary data as well as environmental data, such as land cover, land use and climate information, for the whole globe. MPC data access systems are designed to allow users to easily locate the specific data they need and obtain customized data extracts that combine the user’s requested data in a coherent package. MPC data resources are described at www.popdata.org.

MPC disseminates 5 kinds of data: textual metadata, microdata (describing persons and households), aggregate summary data (describing places), GIS boundary files, and raster data. MPC plans to disseminate metadata and selected boundary and raster data through its DataONE Member Node. MPC’s DataONE Member Node currently focuses on content from Terra Populus (TerraPop), one of its many data projects.

Terra Populus integrates the world’s population and environmental data, including: • Census and survey microdata describing the characteristics of individuals and their families and households • Aggregate census and survey data, describing the characteristics of places, including aggregate population characteristics, land use, and land cover • Data derived from remote-sensing describing land cover and other environmental characteristics • Climate data describing temperature, precipitation, and other climate-related variables • Vector-based geographic information system (GIS) data delineating administrative and census unit boundaries

The data in the TerraPop collection originate in three major data structures: raster, area-level, and microdata. TerraPop provides interoperability among these data structures through location-based transformations from one structure to another. The TerraPop extract builder (data.terrapop.org) allows users to select variables and datasets from any of the three data structures, and combine them into a single integrated dataset in the data format that best meets the needs of their analysis.

3  Spring 2016

WorkingGroupFOCUS Peer Review: Not just for papers

The DataONE Data Management Modules available on the website. We love getting are one of the primary education products feedback on our materials via social media developed by the DataONE Community and email, however to ensure the materials Engagement and Outreach (CEO) working continue to meet the needs of our community group. Initially designed to facilitate we felt that peer review - a standard for data management training within the published articles - was appropriate. The undergraduate and graduate curriculum, goal of this peer review was threefold. First, the downloadable PowerPoint modules were to invite community participation; second, intended to enable faculty to quickly and easily to strengthen our materials and learn of augment their teaching materials. We have new approaches and practices; and finally to since learned that libraries and others are discover new opportunities to work closely using the modules for self-learning and for with the community. As part of this process, Figure 2: Data Management Modules training additional communities outside Earth the CEO Working Group members created and environmental sciences. an assessment tool that distilled key criteria contacted them to ask for their assistance Over the last year the CEO working for reviewers to use in evaluating each of the with evaluating and improving the materials. group has been reviewing and updating modules for accuracy, current information, We have had many positive responses to our the data management modules, ensuring and overall quality. This external feedback request, and look forward to reporting and their currency and augmenting them with will help us to improve upon the current disseminating the results in more detail after additional information. We have also created modules, as well as consider possibilities the initial survey period has been completed. complimentary one page synopses and for additional new educational materials and We recognise these individuals do represent ‘hands-on exercises’ to support learning. future directions. a subset of the user community and would One new project of the CEO working group We have identified a list of people with value input from the broader community. is the completion of a peer review process expertise in the relevant subject areas Have you previously used a data management for the data management modules currently for each of the eleven modules and have module? Are you currently using our modules? Would you be willing to review a module? There is no requirement for review multiple modules so if you have feedback on a particular topic, we’d be grateful for your input here. Additionally, we are always looking for information and feedback about how our educational modules, screencasts, webinars, and other outreach materials are being used. As part of an environmental scan of links and citations, we are happy to have found many institutional research and library resource pages related to research data management, that link to our materials - but we are always seeking to reach additional audiences and to improve through community feedback. If you use, link, or share DataONE educational materials, please let us know! n — Heather Soyka CEO Posdoctoral Associate Figure 1: LibGuide at the University of Arkansas highlighting DataONE resources Amber Budden Director for Community Engagement and Outreach

4  Spring 2016

Save the Date: DataONE Users Group Meeting

Please save July 17-18, 2016 for the open DataONE Users Group meeting to be co-located with the Summer ESIP Federation Meeting at the Durham Convention Center, Durham, North Carolina. The DataONE Users Group (DUG) meeting will be a 2-day event featuring plenary presentations, topical breakout sessions, and community-led discussions.

There is no registration fee to attend and participate in the DUG meeting.

Registration and hotel block will open in the spring, a few months before the meeting. Please visit https://www.dataone.org/dataone-users-group for updates and to join the DUG.

Meeting Theme and Objectives The 2016 Meeting theme, “Expanding Data Networks,” will focus on the new challenges and efforts in making data accessible, discoverable, and deliverable while promoting open data policies, standards, and compliance with funders’ emerging data management requirements. A strong emphasis is on data synthesis and technological progress made in data network infrastructure. The scientific program of the 2016 meeting will invite talks and posters on the following topics: • Leveraging research data level metrics for large data repositories and data networks • Integrating the needs and inputs of data users to advance and improve data discoverability • Assessing the progress, impact, and success in promoting open data policies DataONE encourages DataONE Member Nodes, data scientists, researchers, scientists, students and others to submit abstracts for posters and talks.

Abstract Submission for Posters and Talks Please submit an abstract (250 words maximum) to [email protected] and indicate whether you prefer to present a talk or a poster. Talks will be approximately 10-20 minutes in duration, to be confirmed with development of the agenda. The poster session will be held the evening of Sunday July 17th during the reception event. Submissions will be reviewed by the DataONE Users Group Steering Committee. Accepted abstracts will be published on the DataONE website.

Important dates Abstract Submission Deadline: April 29th 2016 Author Notification: May 15th 2016

DUG Steering Committee: Felimon Gayanilo (co-chair), Plato Smith (co-chair), Steven Aulenbach, Amber Budden, Debora Drucker, Rebecca Koskela, Myrica McCune, Laura Moyers, Shannon Rauch, Robert Sandusky, Stephanie Simms, Heather Soyka

5  Spring 2016 TheDUGout Dear DUG Members

We are excited to promote the 2016 DataONE Users Group meeting, taking place on July 17-18, 2016 at the Durham Convention Center in Durham, NC. This meeting marks a change in the agenda development for DataONE Users Group meetings as it will be the first DUG to have community submitted talks. We will retain DataONE updates and the Member Node showcase however we are opening up the agenda to talks from the community. The meeting theme is “Expanding Data Networks” and focuses on new challenges and efforts in making data accessible, discoverable, usable, and reproducible while promoting open data policies, standards, and compliance with funding agencies’ evolving data management and sharing requirements. The 2016 DataONE Users Group scientific program invites talks and posters including but not limited to the following topics: • Leveraging research data-level metrics for large data repositories and data networks • Integrating the needs and inputs of data users to advance and improve data discoverability • Assessing the programs, impact, and success in promoting open data policies Details on how and where to submit can be found on the previous page of this issue and also online at https://www.dataone.org/dataone-users- group/2016-meeting. The deadline for submissions is April 29th 2016. We encourage DataONE Member Nodes, data scientists, researchers, students, and others to submit abstracts for posters or talks. In an effort to ensure the agenda is as community driven as possible, we are also accepting suggestions for roundtable and break-out sessions. Please take a moment of your time to contribute to our brief survey at https://www. surveymonkey.com/r/DUG2016. n - Felimon Gayanillo, DUG co-chair, Texas A&M University CyberSPOT - Plato Smith, DUG co-chair, University of Florida 450

The DataONE production environment 400 has 31 participating Member Nodes 350 providing access to more than 116,000 300 publicly readable, current version data sets comprised of 209,000 metadata and 393,000 250

data objects. A total of 1,010,368 individual Thousands 200 objects are resolvable and retrievable through DataONE and the participating 150 Member Nodes. 100 Recent updates to the DataONE 50 infrastructure include versions 2.1.1 and 2.1.2. These releases enable support for 0 registration of data services operated by Jun -­‐ 2012 Feb -­‐ 2013 Nov -­‐ 2013 Jul -­‐ 2014 Apr 2015 Dec Member Nodes and for indexing content Date Uploaded to DataONE described by the NOAA variant of ISO-19139 Data Metadata Resource metadata. Figure 1: Counts of data/metadata/resource maps uploaded to DataONE since release in July 2012 Access to large data objects through the DataONE APIs currently requires download Consortium services (Web Feature Service, will be described and presented as part of the of the complete data object, a task that can Web Mapping Service, Sensor Observation metadata describing the data set. Loosely be daunting or impractical for some data. In Service and so forth), OPeNDAP, SPARQL bound services on the other hand, can work many cases, the Member Nodes providing endpoints. Member Nodes may also deploy with multiple data sets. In these cases, a access to such data also provide services custom services tailored to suit the specific data set identifier will often be a required that enable user to retrieve a smaller, requirements of the data and the community parameter during service invocation. more manageable portion of the data. Data being served. By leveraging the DataONE metadata service registration support by the DataONE Services registered with DataONE synchronization and discovery infrastructure, Coordinating Nodes extends the existing Coordinating Nodes can be discovered service discovery and interactions will infrastructure of metadata synchronization through the search interfaces much like the benefit from the same advances in and indexing for data discovery to also process for discovering data. In some cases, tracking and semantic search include discovery of services offered by the services are tightly bound to the data, as these capabilities become progressively Member Nodes. Examples of such services meaning that the service only operates with a more integrated with the production include the various Open Geospatial specific data set. In these cases, the service infrastructure.n

6  Spring 2016 OutreachUPDATE

The last three months have seen a lot of activity within the community engagement and outreach components of DataONE, both in current projects and with new collaborations. DataONE is pleased to be part of an ESIP driven project that was recently awarded funding from the USGS Community for Data Integration to develop a clearinghouse of Data Management Training resources. This project will consolidate existing training materials from DataONE, ESIP and others into a searchable resource, enabling scientists to search and browse an inventory of training resources related to data management in the Earth sciences. DataONE is also collaborating with NCEAS and NCEI on the new NSF Arctic Data Center which was announced in March. Serving as the NSF Arctic research community’s primary repository for data preservation and data discovery, the Arctic Webinar attendance 2015 - 2016 Data Center is also set to provide extensive data-management and open-science training The community will be able to follow along for Arctic researchers. with their progress at https://notebooks. In house, DataONE is in the process of dataone.org/ and we will provide an update 1312 Basehart SE finalizing the Summer Internship Program here in the next issue. University of New Mexico application process. Successful interns will Finally, we are approaching the end of Albuquerque, NM 87106 work across five projects between May and our second season of the DataONE Webinar Fax: 505.246.6007 July 2016 and will be working in an open Series. To date, we have hosted 11 webinars DataONE is a collaboration notebook environment. from expert speakers on diverse topics. We among many partner organizations, The five projects are: are enjoying hosting this series and from the and is funded by the US National 1. Exploring the Impact of DataONE: Data feedback we have received, believe these are Science Foundation (NSF) under Publication and Access Metrics a valuable contribution to the community. a Cooperative Agreement. 2. Semantic Entity Extraction and Linking We are thrilled with our attendance levels for Annotation and Ontology Evolution exceeding 100 each month and with our 65% Project Director: William Michener 3. Developing a Survey Instrument for registration to attendance rate. There are two [email protected] Evaluation of Teaching Materials more webinars remaining in this series and we 505.814.7601 4. Emerging Research Communities: hope you are able to join us. If not, you can Fulfilling the Potential of Open Access access recordings of previous webinars online Executive Director: Earth Science Data at: https://www.dataone.org/webinars. n Rebecca Koskela 5. Reproducibility of Script-Based [email protected] Workflows: A Case Study and 505.382.0890 Demonstration Director of Community Engagement and Outreach: Amber Budden [email protected] 505.205.7675

Director of Development and Operations Dave Vieglais [email protected]

7