Building the Online Data Observatory

Building the Online Data Observatory

BUILDING THE ONLINE DATA OBSERVATORY ENABLING DYNAMIC INTEROPERABLE SCIENTIFIC BIG DATA ANALYTICS AT NKN Luke Sheneman, Ph.D Technology and Data Services Manager Northwest Knowledge Network (NKN) Online Data Observatory ¨ Data: The whole of your accessible data is greater than the sum of its parts. ¨ Data Interoperability: Enable investigators to easily analyze large, heterogeneous datasets without struggling with file formats, unit conversion, manual subsetting, spatio-temporal scale harmonization, variable mapping ¨ Leverage: New science with old data ¨ Tools: Desktop, mobile, HPC, and web-enabled analytical and visualization tools dynamically connected to data Data Observatory Components Rich Metadata Data Resources Content Syntax Semantics Internal Remote Data • Discovery • Data Data • Description structures • Ontologies Catalog Metadata • Variables • Data types • Linkages Harvesting • Space-time • Data Format • Context coverage Data Representation Tools Web Service APIs Explicit Self- Tools access web Functions Examples Data Describing APIs - not files Model Formats • Subsetting • OPeNDAP • Web tools • Aggregation • WaterOneFlow • • R, Matlab, SAS • Machine-to- GIS / OGC • ODM/ODM2 • HDF • WMS • Database • GRIB • GIS Machine • Viz: VTK, IDL • Efficient • WCS Schemata • NetCDF • WFS A Big Data Example: Climate Science Downscaled Climate Data + 100TB, replicated to INL + 4km Western US and CONUS Web Service API + Multivariate Adaptive Constructed Analogs (MACA) + OPeNDAP via THREDDS + Historical and Projected 1950-2099 + ncISO + 20 Models X Several Variables + OGC: WMS + WCS + Aggregation: Virtual datasets + Subsetting: Dynamic space-time Metadata + Rich ISO 19115-2 Collection & Granule Metadata + Additional Embedded Metadata in NetCDF Files + Metadata Harvested and Exposed via NKN Catalog Tools + Metadata also Exposed through THREDDS + Instant connectivity for: ArcGIS, R, Python, MatLAB, more. + Interactive collaborative analysis iPython Notebooks Data Representation + Web interfaces: + Gridded NetCDF4 (HDF5) maca.northwestknowledge.net + Thousands of files (between 50MB and 5GB each) + Self-describing, machine-readable, embedded metadata NKN as Idaho’s Online Data Observatory University Science Science DMZ DMZ 10Gbps 500TB 500TB Storage Storage Metadata Web Web Metadata Catalog Services Services Catalog Investigators INL HPC R, Python, MATLAB, GIS DataONE – Federation of Observatories ¨ NKN as Tier-4 DataONE Member Node ¨ DataONE Implemented as Web Service API ¨ Investigator Toolkit: Challenges and Opportunities ¨ Data Interoperability Research ¤ Observation and Scientific Data Models n e.g. ODM2 ¤ Increasingly Expressive Underlying File Formats n e.g. NetCDF/HDF and successors ¤ More Efficient Web Services ¤ Data and Model Integration n CSDMS, Statistical Issues, Spatio-Temporal Harmonization ¨ Real-time, on-demand HPC powering NKN web analytic tools and services Thank You “The goal is to have a world in which all of the science literature is online, all of the science data is online, and they interoperate with each other. Lots of new tools are needed to make this happen.” -Jim Gray Microsoft Research .

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    8 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us