A Case Study of Data Integration for Aquatic Resources Using Semantic Web Technologies
Total Page:16
File Type:pdf, Size:1020Kb
A Case Study of Data Integration for Aquatic Resources Using Semantic Web Technologies By Janice Gordon, Nina Chkhenkeli, David Govoni, Frances Lightsom, Andrea Ostroff, Peter Schweitzer, Phethala Thongsavanh, Dalia Varanka, and Stephan Zednik Open-File Report 2015–1004 U.S. Department of the Interior U.S. Geological Survey U.S. Department of the Interior SALLY JEWELL, Secretary U.S. Geological Survey Suzette M. Kimball, Acting Director U.S. Geological Survey, Reston, Virginia: 2015 For more information on the USGS—the Federal source for science about the Earth, its natural and living resources, natural hazards, and the environment—visit http://www.usgs.gov or call 1–888–ASK–USGS For an overview of USGS information products, including maps, imagery, and publications, visit http://www.usgs.gov/pubprod To order this and other USGS information products, visit http://store.usgs.gov Suggested citation: Gordon, Janice, Chkhenkeli, Nina, Govoni, David, Lightsom, Frances, Ostroff, Andrea, Schweitzer, Peter, Thongsavanh, Phethala, Varanka, Dalia, and Zednik, Stephan, 2015, A case study of data integration for aquatic resources using semantic web technologies: U.S. Geological Survey Open-File Report 2015–1004, 55 p., http://dx.doi.org/10.3133/ofr20151004. ISSN 2331-1258 (online) Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government. Although this information product, for the most part, is in the public domain, it also may contain copyrighted materials as noted in the text. Permission to reproduce copyrighted items must be secured from the copyright owner. ii Contents Abstract ...................................................................................................................................................................... 1 Introduction ................................................................................................................................................................. 1 Challenges of Scientific Data Integration ................................................................................................................ 1 The Potential of Semantic Web Technologies......................................................................................................... 2 Demonstration Project ............................................................................................................................................. 3 Purpose and Scope ............................................................................................................................................. 3 Use Case: Data Integration for Freshwater Fish Habitat Modeling ...................................................................... 3 Data Sources ....................................................................................................................................................... 4 Methods ...................................................................................................................................................................... 4 Application of the Methodology ................................................................................................................................... 6 Use Case Development and Iteration ..................................................................................................................... 6 Information Modeling ............................................................................................................................................... 7 Technical Approach ................................................................................................................................................ 9 Architecture Design Plan ..................................................................................................................................... 9 Data Preparation ............................................................................................................................................... 10 BioData .......................................................................................................................................................... 10 Multistate Aquatic Resource Information System .......................................................................................... 11 National Geochemical Survey ........................................................................................................................ 11 National Hydrography Data set ...................................................................................................................... 13 Prototype Development Approach ..................................................................................................................... 13 Results ...................................................................................................................................................................... 14 User Interface Design ........................................................................................................................................... 14 Prototype Architecture ........................................................................................................................................... 15 TDB Triple-store ................................................................................................................................................ 16 Fuseki SPARQL Endpoint ................................................................................................................................. 16 API Development .................................................................................................................................................. 17 Integrated Data .................................................................................................................................................. 18 Fish ................................................................................................................................................................ 20 Water ............................................................................................................................................................. 22 RDF Serialization in Turtle ......................................................................................................................... 23 Sediment........................................................................................................................................................ 24 RDF Serialization in Turtle ......................................................................................................................... 24 Geospatial Data Integration ............................................................................................................................... 24 Discussion and Conclusions ..................................................................................................................................... 25 Evaluation of the Prototype ................................................................................................................................... 25 Evaluation of the Use Case ............................................................................................................................... 26 Evaluation of the Methodology .............................................................................................................................. 27 Future Goals ......................................................................................................................................................... 27 Provenance and Data Quality ............................................................................................................................ 28 Technology Performance .................................................................................................................................. 28 Geospatial Semantic Integration ....................................................................................................................... 28 iii Use(s) of Linked Data ........................................................................................................................................ 29 Acknowledgments ..................................................................................................................................................... 29 References Cited ...................................................................................................................................................... 30 Glossary ................................................................................................................................................................... 32 Appendix 1: Semantic Web Technologies (Overview) .............................................................................................. 34 Data Interchange................................................................................................................................................... 34 Resource Description Framework ..................................................................................................................... 34 Resource Description Framework in Attributues (RDFa) ................................................................................... 34 Linked Data ......................................................................................................................................................