How Environmental Informatics Is Preparing Us for the Big Data Era

How Environmental Informatics Is Preparing Us for the Big Data Era

How Environmental Informatics is Preparing Us for the Big Data Era MURPA Seminar University of Queensland October 25, 2013 Peter Fox; [email protected], @taswegian – Tetherless World Constellation Rensselaer Polytechnic Institute *also Woods Hole Oceanographic Institution RPI Tetherless World Constellation tw.rpi.edu • Government Data Future Web • Health care/Life Sciences •Web Science • Environmental Informatics •Policy •Social Hendler Xinformatics •Data Science •Semantic eScience •Data Frameworks Lots of RDF Fox Web Infrastructure Scaling and Distributed Query and Reasoning AI, Rule reasoning Semantic Foundations Visualizing SW ‘data’ •Knowledge Provenance Social SW, SMWiki •Inference, Trust Policy Lang Ontologies/Tools, … Luciano, Erickson + ~ 40 = Post-docs, Staff, Grad, UGrad McGuinness The Planet is Under Pressure photos: www.dawide.com Future Earth Research for global sustainability 3 Climate of the last Millennium Caspar Ammann NCAR/CGD The National Center for Atmospheric Research In recognition and appreciation of the PCMDI / LLNL for its invaluable contribution to the CCSM3 development, production, and data analysis effort for the 2007 IPCC Fourth Assessment Report. "The Norwegian Nobel Committee has decided that the Nobel Peace Prize for 2007 is to be shared, in two equal parts, between the Intergovernmental Panel on Climate Change (IPCC) and Albert Arnold (Al) Gore Jr. for their efforts to build up and disseminate greater knowledge about man-made climate change, and to lay the foundations for the measures that are needed to counteract such change." A Vision Fulfilled Pre-2000 2000-Present 2002-Present Home Grown Community Data Portal Earth System Grid Data Systems dataportal.ucar.edu • Initially Cheap • Modest Investment • Large Investment • $$$ in long term • Agile and Right-sized • Infrastructure for Large Projects • Limited Scale for Many Projects • Institutional Scale • Spans Institutions Providing climate scientists with virtual proximity to large simulation results needed for their research ESG Goal Current ESG Sites • Very large distributed data archives Easy federation of sites Across the US and around the world • “Virtual Datasets” created through subsetting and aggregation • Metadata-based search and discovery • Web-based and analysis tool access • Increased flexibility and robustness • Server-side analysis http://www-pcmdi.llnl.gov Dean Williams, PCMDI, ~ 2008 Evolving for the future ESG Data System Evolution 2008 Early 2009 2011 Central database Testbed data sharing Full data sharing (add to testbed…) Centralized curated data archive Federated metadata/portals Synchronized federation metadata, data Time aggregation Unified user interface Full suite of server-side Distribution by file transport Quick look server-side analysis analysis with CDAT No ESG analysis with CDAT Model/observation integration Shopping-cart-style web portal Location independence ESG embedded into desktop ESG connection to desktop analysis Distributed aggregation productivity tools with CDAT tools Manual data sharing/publishing GIS integration Model intercomparison metrics User support, life cycle maintenance Terabytes Petabytes CCSM CCSM, AR5, ESG Data Archive satellite, In situ AR4 biogeochemistry, ecosystems Dean Williams, PCMDI And beyond… Scaling a big infrastructure! Climate data - worldwide The Future for Earth Science: A Global Earth Observation System of Systems ~ 2009 © GEO Secretariat Millennium Ecosystem Assessment 14 http://images.firstcovers.com/covers/flash/w/wake_up-19483.jpg .. Data has Lots of Audiences More Strategic Less Strategic From “Why EPO?”, a NASA internal report on science education, 2005 Science too! 16 The Global Change Research Act and USGCRP • USGCRP was mandated by Congress in the Global Change Research Act (GCRA) of 1990 (P.L. 101 – 606) “To provide for development and coordination of a comprehensive and integrated United States Research Program which will assist the Nation and the world to understand, assess, predict, and respond to human- induced and natural processes of 18 global change.” U.S. Global Change Research Program The Program: • Coordinates Federal research to better understand and prepare the nation for global change • Prioritizes and supports cutting edge scientific work in global change • Assesses the state of scientific knowledge and the Nation’s readiness to respond to global change • Communicates research findings to inform, educate, and engage the global community 19 Global Change Information System (GCIS) Vision: A unified web based source of authoritative, accessible, usable, and timely information about climate and global change for use by scientists, decision makers, and the public. 20 Global Change Research Act (1990), Section 106 …not less frequently than every 4 years, the Council… shall prepare… an assessment which– •integrates, evaluates, and interprets the findings of the Program and discusses the scientific uncertainties associated with such findings; •analyzes the effects of global change on the natural environment, agriculture, energy production and use, land and water resources, transportation, human health and welfare, human social systems, and biological diversity; and •analyzes current trends in global change, both human- induced and natural, and projects major trends for the subsequent 25 to 100 years. 21 Previous National Climate Assessments Climate Change Impacts on Global Climate Change Impacts the United States (2000) in the United States (2009) http://nca2009.globalchange.gov Target date for next NCA: 2013 22 NCA 2009 http://nca2009.globalchange.gov 23 Prototype Use Case Name Discover and visit data center website of dataset used to generate report figure. Goal The NCA Report reader sees a figure and wants to know where the data came from. A reader of the NCA is browsing the content via the website. He/she sees a figure and wants to know where the Summary data came from. A reference to the publication in which the figure originated appears in the figure caption. Selecting the link to the source publication displays a page of information about the publication including, if available, the publication DOI. The page also includes references to the datasets cited in the publication. Following each of dataset reference links presents a page of information about the dataset, including links back to the agency/data center webpage describing the dataset in more detail and making the actual data available for order or download. Actors Primary Actor - reader of the NCA Preconditions Reader is viewing the NCA online report Post Conditions Reader visits the data center dataset website 1) System is presenting the NCA report to the reader in a web site. Presentation includes report figure with caption Normal Flow that includes reference to source publication. 2) Reader selects publication reference in figure caption 3) System displays information about publication, including DOI (if available). 4) Publication information includes publication dataset citations. 5) Reader selects a dataset cited by the publication. 6) System displays information about dataset including links to agency / data center webpages where more information and (potentially) data download links are available. 7) Reader selects the data center link and is redirected to data center dataset webpage. Assessment links to information 25 Traceable accounts… Magic here ! 26 Under the hood – a graph 27 28 Key Message & A Traceable Account Key Message vs. “General” Message Computer science-y things wasDerivedFrom wasInformedBy used ENTITY ACTIVITY wasGeneratedBy startedAtTime, endedAtTime wasAttributedTo wasAssociatedWith AGENT actedOnBehalf Diagram from W3C PROV group and Ivan Herman Non-specialist Use Case Name Find Latest Datasets by Keyword Goal Search for datasets associated with the keyword “snow”, list search results by recentness of publication. User story: Summary I want to look for information concerning “snow.” I don’t know if it is a CLEAN word or a GCMD word or don’t even know what GCMD or CLEAN is. How would I do it, and what would I see on my monitor during the process? Assumptions The reader is not assumed to have knowledge regarding the GCMD Keywords (or other) vocabulary. Actors Primary Actor - reader of the NCA Preconditions TBD Post Conditions Reader is presented with a list of datasets associated with the keyword “snow” sorted by dataset publication date. Normal Flow TBD We are looking into two user interface options for dataset selection by keyword Notes 1)As a free-text search where the user inputs “snow”. 2)Present the user a faceted browse interface with a vocabulary faceted which presents the user with terms from a structured vocabulary. The user can manually select the term(s) which match or contain “snow”. We intend to implement prototypes of both. CLEAN Vocabulary Large Marine Eco- systems US National Ocean Policy Designates 9 Large Marine Ecosystems within US to serve as unifying framework for integrated science, management, and governance Understand Communities Of Stakeholders (Suzanne Lawrence) So… you are wondering • What have we learned? • And how must be go forward? • And not just for climate… Data as a 1st class citizen http://thomsonreuters.com/content/press_room/science/686112 38 Science ecosystems Integrateability Citability Identity Explanation Justification Verifiability Proof Trust Accountability ‘Transparency’ -> Translucency • These elements are what enable scientists to explore/ confirm/ deny their research ideas and collaborate! • Abduction as well as induction

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    53 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us