Calcofi Data Management, White Paper. Karen I
Total Page:16
File Type:pdf, Size:1020Kb
UC San Diego Scripps Institution of Oceanography Technical Report Title CalCOFI Data Management, White Paper Permalink https://escholarship.org/uc/item/9gr058dq Authors Stocks, Karen I Baker, Karen S Publication Date 2005-09-01 eScholarship.org Powered by the California Digital Library University of California CalCOFI Data Management White Paper Karen I. Stocks1 and Karen S. Baker2 1San Diego Supercomputer Center, UCSD 2 Scripps Institution of Oceanography, UCSD Scripps Institution of Oceanography Technical Report September 2005 Table of Contents Purpose 1 Acknowledgements 1 1. Introduction 3 2. Current Management of CalCOFI Data 6 2.1 Hydrographic "bottle" dataset 6 2.2 Ichthyoplankton 8 2.3 Zooplankton 9 3. Evaluation of CalCOFI Data Management, Recommendations, and Recent Activities 10 3.1 Database software and organization 10 3.2 Backup and Archive 10 3.3 Metadata and Data Documentation 10 3.4 Data acquisition and processing 13 4. Developing an Integrated Access System 15 4.1 Organizational Considerations 15 Coordination of Development and Collaborative Design 16 Community Communication 16 4.2 Technical Considerations 18 Common Database Elements for Integration 18 Integrated Ocean Observing System (IOOS) Compatibility 19 Centralized Versus Distributed Systems 20 Data Contribution, Use, and Acknowledgement Policies 20 Standards 21 Assessing Available Tools 21 Hardware and Software 23 4.3 A Prototype Online Access System 24 5. Expanding Beyond Focus Data 25 5.1 Data Types 25 Discrete Data Acquired on CalCOFI Cruises on CalCOFI Stations 25 Continuous Data Acquired on CalCOFI Cruises on CalCOFI Stations 26 Continuous Data Acquired on CalCOFI Cruises between CalCOFI Stations 26 Raster/Grid Data in the CalCOFI Region 27 5.2 Data Inventory 28 6. Summary of Recent Activities 29 7. Scaling up: Considerations for PaCOOS 31 8. Conclusions 32 Acknowledgements 32 ii Table of Contents, continued Tables, Figures and Appendices Table 1: Summary of Recommendations 2 Figure 1: CalCOFI Partnerships 4 Figure 2: CalCOFI Data Diagram 7 Appendix 1: Personnel list 33 Appendix 2: Meeting Summaries 35 Appendix 3: Data Inventory 38 Appendix 4: Data Inventory Template 44 Appendix 5: 2005/2006 Proposal 45 Appendix 6: Glossary of Acronyms 47 iii Purpose The California Cooperative Ocean Fisheries Investigations (CalCOFI) program is one of the longest-running, multidisciplinary ocean monitoring and observing programs in existence. For many years, the emphasis of data management within CalCOFI was to quality control and curate the individual datasets collected on CalCOFI cruises, and make them available to researchers and fisheries managers through printed reports and requests for data to the data curators. Today, a new goal is emerging of having CalCOFI datasets available online and, eventually, interoperable with other CalCOFI-related datasets and the larger, developing federation of the Ocean Observing System data. In this document we review the current state of data management within three of the primary CalCOFI datasets (hydrographic, ichthyoplankton, and zooplankton) and then make recommendations for moving towards the integrated online system that is envisioned, including expanding to other data types. Concrete recommendations for moving forward are summarized in Table 1 and explained in more detail throughout the text. Acknowledgements We give thanks to Rich Charter, Ralf Goericke, John Hunter, Mati Kahru, Mark Ohman, Jesse Powell, Elizabeth Venrick, Jerry Wanetick, and Jim Wilkinson for their contributions and to the CalCOFI community for supporting this work. Scripps Institution of Oceanography CalCOFI provided the support for this planning document. 1 Table 1: Summary of Recommendations – detail provided throughout the text Organization Cross Community Coordination Establish decision-making structures, committee Participate in IOOS data management working members, and responsibilities. In particular, we groups to expand IOOS technologies, such as recommend that the CalCOFI Committee create OPeNDAP, to better handle CalCOFI data an Information Management subcommittee types. composed of the data managers of each main Archive data with the National Ocean Data CalCOFI dataset to minimize divergent Center until IOOS archive centers develop. development. The decision-making structure should then develop: Review existing external tools, standards, protocols that can be useful for CalCOFI, • short-term development priorities (i.e. including transport specifications (OPeNDAP, central vs. distributed system, key DiGIR, OGC), metadata standards (FGDC, functionality, data priorities, who will host the EML, GM3/GETADE), species observation online access system, etc.), standards (OBIS, BioCASE), taxonomy • long-term strategies (open source vs. standards (NCBI), data formats (NetCDF, HDF, proprietary software, processes for XML, RDB, ascii), controlled vocabularies expansion, personnel training needs, etc.). (MRIB, MMI), catalogues (Metacat, Mcat, Thredds), analysis/visualization tools (LAS, Develop community communication mechanisms ACON, ODV, IDV), and the International Council through meetings, prototype assessments, etc. for the Exploration of the Sea (ICES) within the informatics community and between Oceanographic standards and Data informatics and oceanographic researchers. Management Software. User outreach mechanisms will be particularly important. Launch local design teams on OPeNDAP, metadata standards, and other interoperability Continue designing and maintaining the strategies such as dictionaries, thesauri, and personnel directory, and make the appropriate ontologies. Select pilot and prototype activities content available online. strategically. Formalize a data submission policy for main CalCOFI data types, a data use policy, and a data acknowledgement statement. Data System Design and Development Continue the ongoing task of the data inventory. Devote resources to redesigning the SIO collections zooplankton database, ideally before new survey lines begin operation. Local Elements and Work Flow Create FGDC metadata for the primary CalCOFI Continue developing the integrated online datasets, as is required for all federally funded access system being created by the SWFSC. data, and make this available in conjunction with all data being served online. • Conduct an evaluation, including user input, of the online access prototype as early in Expand metadata to hold information on development as possible. methods of collection, taxonomic resolution, precision information, and quality control Review and update the data input and procedures. processing procedures. Register standard metadata with the Global Devote modest additional funds for systems Change Master Directory and/or the NOAA administration at SIO. Coastal Data Development Center. Document data system components and data Focus in the short term on readying datasets management best practices. locally for flexible integration and data delivery in a variety of formats. 2 1. Introduction The purpose of this document is to overview the current state of data management within the California Cooperative Ocean Fisheries Investigations (CalCOFI) program and outline steps and recommendations for building towards an integrated, online information system for CalCOFI. It will also include consideration of how this effort could be scaled up to contribute to the emerging Pacific Coast Ocean Observing System (PaCOOS). CalCOFI is a long-term sampling program aimed at understanding the marine environment off southern California and improving living resource management in the region. It represents a consortium of the NOAA/NMFS Fisheries Resource Division of the Southwest Fisheries Science Center (SWFSC), the California Department of Fish and Game, and the Integrative Oceanographic Division (IOD) of Scripps Institution of Oceanography (SIO). A number of journal articles provide an introduction to the CalCOFI program (e.g. Ohman and Venrick, 2003; Bograd et al, 2003; Hewitt, 1988; CalCOFI, 1989, 1999) and demonstrate integrative uses of the data to address particular scientific questions (e.g. Deep Sea Research, 2003; McGowan et al, 1998; Roemmich and McGowan, 1995). The core of CalCOFI is the regular sampling: 4 times per year research vessels visit a standardized grid of 66 stations to measure physical, chemical, biological, and meteorological properties of the California Current ecosystem including censuses of organisms from phytoplankton to birds. Though the spatial and temporal scope has varied through time, this program has persisted since its inception in 1949. A program of this scope creates an enormous volume of heterogeneous data that can be valuable to a wide variety of oceanographic research and resource management applications. A recent interest in making the full range of data readily available online has led to an effort to understand holistically the current state of data management of CalCOFI data and lay a path towards building an integrated online access system. A second driver for recent CalCOFI data management interest comes from the emerging array of efforts with which to partner. Figure 1 shows some of the developing local, national, and international ocean data projects relevant to CalCOFI. In the United States, one of the larger efforts, the Integrated Ocean Observing System (IOOS),1 is in the early stages of development. The goal of IOOS is to create a unified and cost-effective approach for providing data needed to meet the suite of ocean-related challenges