UC Santa Barbara Dissertation Template
Total Page:16
File Type:pdf, Size:1020Kb
UC Santa Barbara UC Santa Barbara Electronic Theses and Dissertations Title Spatial Discovery and the Research Library: Linking Research Datasets and Documents Permalink https://escholarship.org/uc/item/45d8m6k1 Author Lafia, Sara Katherine Publication Date 2016 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA Santa Barbara Spatial Discovery and the Research Library: Linking Research Datasets and Documents A Thesis submitted in partial satisfaction of the requirements for the degree Master of Arts in Geography by Sara Lafia Committee: Professor Werner Kuhn, Co-Chair Professor Krzysztof Janowicz, Co-Chair Dr. Katja Seltmann March 2017 The thesis of Sara Lafia is approved. ____________________________________________ Katja Seltmann ____________________________________________ Krzysztof Janowicz, Committee Co-Chair ____________________________________________ Werner Kuhn, Committee Co-Chair January 2017 ABSTRACT Spatial Discovery and the Research Library: Linking Research Datasets and Documents by Sara Lafia Academic libraries have always supported research across disciplines by integrating access to diverse contents and resources. They now have the opportunity to reinvent their role in facilitating interdisciplinary work by offering researchers new ways of sharing, curating, discovering, and linking research data. Spatial data and metadata support this process because location often integrates disciplinary perspectives, enabling researchers to make their own research data more discoverable, to discover data of other researchers, and to integrate data from multiple sources. The Center for Spatial Studies at the University of California, Santa Barbara (UCSB) and the UCSB Library are undertaking joint research to better enable the discovery of research data and publications. The research addresses the question of how to spatially enable data discovery in a setting that allows for mapping and analysis in a GIS while connecting the data to publications about them. It suggests a framework for an integrated data discovery mechanism and shows how publications may be linked to associated data sets exposed either directly or through metadata on Esri’s Open Data platform. The results demonstrate a simple form of linking data to publications through spatially referenced iii metadata and persistent identifiers. This linking adds value to research products and increases their discoverability across disciplinary boundaries. Current data publishing practices in academia result in datasets that are not easily discovered, hard to integrate across domains, and typically not linked to publications about them. For example, discovering that two datasets, such as archaeological observations and specimen data collections, share a spatial extent in Mesoamerica, is not currently supported, nor is it easy to get from those data sets to relevant publications or other documents. In our previous work, we had developed a basic linked metadata model relating spatially referenced datasets to documents. The research reported here applies the model to a collection of spatially referenced researcher datasets, capturing metadata and encoding them as linked open data. We use existing RDF vocabularies to triplify the metadata, to make them spatially explicit, and to link them thematically. Our latest research has produced a simple and extensible method for exposing metadata of research objects as a library service and for spatially integrating collections across repositories. iv TABLE OF CONTENTS Chapter 1. Spatial Discovery and the Research Library Introduction ............................................................................................................ 1 Problem Statement ..................................................................................... 1 Motivation .................................................................................................. 3 Background and Related Work .............................................................................. 5 Library Repositories .................................................................................. 6 Emerging Spatial Data Technologies ........................................................ 7 State of the Art ......................................................................................... 10 Methods ............................................................................................................... 12 User Personas ........................................................................................... 15 Experimental Design ............................................................................... 15 Results .................................................................................................................. 21 Discussion and Conclusion .................................................................................. 22 Limitations ............................................................................................... 23 Next Steps ................................................................................................ 24 Acknowledgements .............................................................................................. 24 References ............................................................................................................ 25 Chapter 2. Spatial Discovery of Linked Research Datasets and Documents Introduction .......................................................................................................... 28 Method ................................................................................................................. 29 v Recruiting Campus Researchers .............................................................. 30 Studying Existing Spatial Metadata Workflow ....................................... 33 Applying Workflow to Describe ArcGIS Online Datasets ...................... 33 Extending Workflow to Describe Documents ......................................... 34 Identifying and Applying Appropriate Vocabularies .............................. 34 Testing the Extended Production Workflow ........................................... 35 Eliciting Researcher Feedback................................................................. 36 Results .................................................................................................................. 36 Sharing Research Objects ........................................................................ 37 Describing Research Objects ................................................................... 38 Aggregating Research Objects................................................................. 39 Refining Research Object Metadata ........................................................ 39 Triplifying Research Object Metadata ..................................................... 40 Querying Research Object Metadata ....................................................... 42 Discussion ............................................................................................................ 45 Conclusions .......................................................................................................... 46 Supplementary Materials ..................................................................................... 47 Acknowlegdements .............................................................................................. 47 Appendix .............................................................................................................. 48 References ............................................................................................................ 53 vi LIST OF FIGURES AND TABLES Chapter 1. Spatial Discovery and the Research Library Figure 1. Project vision for data discovery and publication integration ...................... 5 Figure 2. UCSB’s Open Data instance leverages ArcGIS Online ............................... 9 Table 1. Personas, domains, and datasets of researchers ........................................... 13 Figure 3. Generic Dublin Core Metadata Initiative (DCMI) data model .................. 17 Figure 4. Reconciled OpenRefine template and RDF skeleton ................................. 18 Figure 5. RDF triples for datasets and publications exported in Turtle syntax ......... 19 Figure 6. A generic SPARQL query against the triples ............................................. 20 Table 2. Example of a triple stored in the RDF framework ...................................... 21 Chapter 2. Spatial Discovery of Linked Research Datasets and Documents Table 1. Selected case study documents, datasets, repositories, and contributors .... 31 Figure 1. Transforming tabular relational database records into triples .................... 40 Figure 2. Applying the Geolink ontology to ArcGIS Online dataset metadata ......... 41 Figure 3. Metadata model adopts Dublin Core, SKOS, and Geolink ........................ 42 Figure 4. Instance of dataset metadata annotated with adopted vocabularies ........... 43 Figure 5. Selected sample SPARQL queries run against Fuseki localhost. .............. 44 Figure A1. Overview of ArcGIS Online Discovery group content ........................... 48 Figure A2. Resulting research object bounding boxes for place "California" ........... 49 Figure A3. Exported metadata fields from ArcGIS Online Administrator................ 50 Figure A4. Hosted triples generated from