Preserving Geospatial Data

Total Page:16

File Type:pdf, Size:1020Kb

Preserving Geospatial Data Technology Watch Report Preserving Geospatial Data Guy McGarva EDINA, University of Edinburgh Steve Morris North Carolina State University (NCSU) Greg Janée University of California, Santa Barbara (UCSB) DPC Technology Watch Series Report 09-01 May 2009 © 2009 1 Executive Summary: Geospatial data are becoming an increasingly important component in decision making processes and planning efforts across a broad range of industries and information sectors. The amount and variety of data is rapidly increasing and, while much of this data is at risk of being lost or becoming unusable, there is a growing recognition of the importance of being able to access historical geospatial data, now and in the future, in order to be able to examine social, environmental and economic processes and changes that occur over time. The geospatial domain is characterized by a broad range of information types, including geographic information systems data, remote sensing imagery, three- dimensional representations and other location-based information. The scope of this report is limited to two-dimensional geospatial data and data that would typically be considered comparable to paper maps or charts including vector data, raster data and spatial databases. There are a number of significant preservation issues that relate specifically to geospatial data, including: the complexity and variety of data formats and structures; the abundance of content that exists in proprietary formats; the need to maintain the technical and social contexts in which the data exists; and the growing importance of web services and dynamic (and ephemeral) data. Standards for geospatial metadata have been defined at both the national and international levels, yet metadata often becomes dissociated from the data, or is incorrect, non-standard in nature, or not created in the first place. Additional considerations to be taken into account in preserving geospatial data include coordinate reference systems, cartographic representations, topology, project files and data packaging. Standards bodies are in place at the national and international levels to address general geospatial data standardization issues, yet working groups addressing preservation issues have only recently been formed. A number of technologies and tools that are, or may be, of relevance to geospatial data preservation efforts have emerged, although the nature of the problem is such that there is not a single tool or technology that will be relevant in all cases. A number of projects and activities have been addressing various aspects of geospatial data preservation, creating an initial body of experience from which some initial recommendations can be made. While these recommendations provide a basic checklist of issues to be considered when preserving geospatial data, it must be emphasized that the collective experience in preserving such data is still very much in an early stage and that further investigations are needed. Keywords: Geographic Information Systems, geospatial data, preservation, spatial databases, geospatial formats, web mapping services 2 Contents: 1 Introduction: why preserve geospatial data? ................................................................................... 4 2 Background: key challenges with geospatial data ........................................................................... 4 3 Geospatial Data Preservation Issues ............................................................................................... 6 3.1 Generic Geospatial Data Issues ............................................................................................. 6 3.1.1 Coordinate Reference Systems ......................................................................................... 6 3.1.2 Cartographic Representation ............................................................................................ 7 3.1.3 Topology ........................................................................................................................... 7 3.1.4 Project Files ...................................................................................................................... 8 3.1.5 Data Packaging ................................................................................................................. 8 3.2 Vector Data ........................................................................................................................... 9 3.2.1 Commercial Vector Data Formats .................................................................................... 9 3.2.2 Open Vector Data Formats ............................................................................................. 11 3.3 Raster Data .......................................................................................................................... 13 3.3.1 Georeferencing and Rectification ................................................................................... 13 3.3.2 Compression ................................................................................................................... 13 3.3.3 Raster Formats ................................................................................................................ 14 3.3.4 Mosaicked Raster Data ................................................................................................... 15 3.3.5 Stereo, Oblique and Ground-Level Imagery ................................................................... 15 3.3.6 Raster Data Size .............................................................................................................. 16 3.4 Emerging Data Formats ....................................................................................................... 16 3.4.1 KML ............................................................................................................................... 16 3.4.2 PDF and GeoPDF ........................................................................................................... 17 3.5 Spatial Databases ................................................................................................................. 17 3.5.1 ESRI Geodatabases ......................................................................................................... 18 3.6 Dynamic Geospatial Data .................................................................................................... 19 3.6.1 Web Map Services (WMS) ............................................................................................. 19 3.6.2 Web Feature Services (WFS) ......................................................................................... 20 3.6.3 Other OGC Web Services ............................................................................................... 20 3.7 Legal Issues ......................................................................................................................... 20 3.7.1 UK Legal Landscape ...................................................................................................... 21 3.7.2 US Legal Landscape ....................................................................................................... 22 3.7.3 ‗Open‘ Geospatial Data .................................................................................................. 22 3.8 Geospatial Metadata ............................................................................................................ 22 3.8.1 Metadata Standards ......................................................................................................... 23 3.8.2 Metadata Challenges for Archives .................................................................................. 23 3.8.3 Geospatial Metadata vs. Preservation Metadata ............................................................. 24 3.8.4 Metadata Creation ........................................................................................................... 25 4 Standards Bodies and Working Groups ........................................................................................ 25 4.1 Open Geospatial Consortium (OGC) .................................................................................. 25 4.1.1 OGC Data Preservation Working Group ........................................................................ 25 4.2 U.S. Federal Geographic Data Committee (FGDC) ............................................................ 26 4.2.1 FGDC Historical Data Working Group .......................................................................... 26 5 Technology and Tools ................................................................................................................... 26 5.1 Digital Globe Tools ............................................................................................................. 26 5.2 Geospatial Format Registries and Validation Tools ............................................................ 27 5.3 ESRI Geodatabase Archiving .............................................................................................. 27 5.4 Digital Repository Software ................................................................................................ 27 6 Conclusions and Recommendations ............................................................................................. 28 7 Glossary of Acronyms .................................................................................................................. 29 8 Selected References and Resources
Recommended publications
  • Solutions for the Chora of Metaponto Publication Series
    Preserving an Evolving Collection: “On-The-Fly” Solutions for The Chora of Metaponto Publication Series Jessica Trelogan Maria Esteva Lauren M. Jackson Institute of Classical Archaeology Texas Advanced Computing Center Institute of Classical Archaeology University of Texas at Austin University of Texas at Austin University of Texas at Austin 3925 W. Braker Lane J.J. Pickle Research Campus 3925 W. Braker Lane +1 (512) 232-9317 +1 (512) 475-9411 +1 (512) 232-9322 [email protected] [email protected] [email protected] ABSTRACT research in this way, complex technical infrastructures and As digital scholarship continues to transform research, so it services are needed to support and provide fail-safes for data and changes the way we present and publish it. In archaeology, this multiple, simultaneous functions throughout a project’s lifecycle. has meant a transition from the traditional print monograph, Storage, access, analysis, presentation, and preservation must be representing the “definitive” interpretation of a site or landscape, managed in a non-static, non-linear fashion within which data to an online, open, and interactive model in which data collections evolve into a collection as research progresses. In this context, have become central. Online representations of archaeological data curation happens while research is ongoing, rather than at the research must achieve transparency, exposing the connections tail end of the project, as is often the case. Such data curation may between fieldwork and research methods, data objects, metadata, be accomplished within a distributed computational environment, and derived conclusions. Accomplishing this often requires as researchers use storage, networking, database, and web multiple platforms that can be burdensome to integrate and publication services available across one or multiple institutions.
    [Show full text]
  • Geo-Text Data and Data-Driven Geospatial Semantics
    Geo-Text Data and Data-Driven Geospatial Semantics Yingjie Hu GSDA Lab, Department of Geography, University of Tennessee, Knoxville, TN, 37996, USA Abstract Many datasets nowadays contain links between geographic locations and natural language texts. These links can be geotags, such as geotagged tweets or geotagged Wikipedia pages, in which location coordinates are explicitly attached to texts. These links can also be place mentions, such as those in news articles, travel blogs, or historical archives, in which texts are implicitly connected to the mentioned places. This kind of data is referred to as geo- text data. The availability of large amounts of geo-text data brings both challenges and opportunities. On the one hand, it is challenging to automatically process this kind of data due to the unstructured texts and the complex spatial footprints of some places. On the other hand, geo-text data offers unique research opportunities through the rich information contained in texts and the special links between texts and geography. As a result, geo-text data facilitates various studies especially those in data-driven geospatial semantics. This paper discusses geo-text data and related concepts. With a focus on data-driven research, this paper systematically reviews a large number of studies that have discovered multiple types of knowledge from geo-text data. Based on the literature review, a generalized workflow is extracted and key challenges for future work are discussed. Keywords: geo-text data, spatial analysis, natural language processing, spatial and textual data analysis, data-driven geospatial semantics, spatial data science. 1. Introduction Recent years have witnessed an unprecedented increase in the volume, variety, and veloc- ity of data from different sources (Miller and Goodchild, 2015).
    [Show full text]
  • Database Preservation DPC Training Course
    Database preservation DPC training course Practical session (advanced) Resolution www.keep.pt Activities on DBPTK Desktop www.keep.pt Click here to start the process of create a SIARD file 1. Select the DBMS on the left sidebar 2. Test the connection to panel and fill up ensure that you have the the connection right information form 3. Click Next to continue the process Click sakila to show the tables and views for that schema Select the tables: actor, category, film, film_actor, film_category and language Select the views: film_list and nicer_but_slower_film_list Materialize the nicer_but_slower_film_list 1. Remove the last_update column from each previous selected table 2. Click Next to continue the process 2. Save the query 1. Fill up the query text area and test the query Click Next to continue the process Click Skip to continue the process 1. Select the destination folder 2. Choose compress checkbox (this will reduce the size of the SIARD file) 3. Choose to save LOBs outside the SIARD file 4. Hit Next to continue the process 1. Fill up metadata information about the SIARD file 2. Click Create to start the migration process Wait for the process to finish, this may take a while, depending on the machine specs and total size of the database Click on Cancel Activities on DBPTK Enterprise www.keep.pt Do an advanced search and save it www.keep.pt Click on Login Choose the database Click Browse 2. Click on 1. Choose a advanced and table fill up the search criteria 3. Click on save search Click on save search Hide the table film_text and the store table www.keep.pt Click on Configuration Click on Manage tables 1.
    [Show full text]
  • Web GIS in Practice VI: a Demo Playlist of Geo-Mashups for Public Health Neogeographers Maged N Kamel Boulos*1, Matthew Scotch2, Kei-Hoi Cheung2,3 and David Burden4
    International Journal of Health Geographics BioMed Central Editorial Open Access Web GIS in practice VI: a demo playlist of geo-mashups for public health neogeographers Maged N Kamel Boulos*1, Matthew Scotch2, Kei-Hoi Cheung2,3 and David Burden4 Address: 1Faculty of Health and Social Work, University of Plymouth, Drake Circus, Plymouth, Devon, PL4 8AA, UK, 2Center for Medical Informatics, School of Medicine, Yale University, New Haven, CT, USA, 3Departments of Anesthesiology and Genetics, School of Medicine, and Department of Computer Science, Yale University, New Haven, CT, USA and 4Daden Limited, 103 Oxford Rd, Moseley, Birmingham, B13 9SG, UK Email: Maged N Kamel Boulos* - [email protected]; Matthew Scotch - [email protected]; Kei- Hoi Cheung - [email protected]; David Burden - [email protected] * Corresponding author Published: 18 July 2008 Received: 6 July 2008 Accepted: 18 July 2008 International Journal of Health Geographics 2008, 7:38 doi:10.1186/1476-072X-7-38 This article is available from: http://www.ij-healthgeographics.com/content/7/1/38 © 2008 Boulos et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract 'Mashup' was originally used to describe the mixing together of musical tracks to create a new piece of music. The term now refers to Web sites or services that weave data from different sources into a new data source or service.
    [Show full text]
  • File Format Guidelines for Management and Long-Term Retention of Electronic Records
    FILE FORMAT GUIDELINES FOR MANAGEMENT AND LONG-TERM RETENTION OF ELECTRONIC RECORDS 9/10/2012 State Archives of North Carolina File Format Guidelines for Management and Long-Term Retention of Electronic records Table of Contents 1. GUIDELINES AND RECOMMENDATIONS .................................................................................. 3 2. DESCRIPTION OF FORMATS RECOMMENDED FOR LONG-TERM RETENTION ......................... 7 2.1 Word Processing Documents ...................................................................................................................... 7 2.1.1 PDF/A-1a (.pdf) (ISO 19005-1 compliant PDF/A) ........................................................................ 7 2.1.2 OpenDocument Text (.odt) ................................................................................................................... 3 2.1.3 Special Note on Google Docs™ .......................................................................................................... 4 2.2 Plain Text Documents ................................................................................................................................... 5 2.2.1 Plain Text (.txt) US-ASCII or UTF-8 encoding ................................................................................... 6 2.2.2 Comma-separated file (.csv) US-ASCII or UTF-8 encoding ........................................................... 7 2.2.3 Tab-delimited file (.txt) US-ASCII or UTF-8 encoding .................................................................... 8 2.3
    [Show full text]
  • Curriculum Vitae KEITH W
    Curriculum Vitae KEITH W. KINTIGH September 29, 2017 PERSONAL INFORMATION School of Human Evolution & Social Change Telephone: (480) 965-6909 (Office) Arizona State University 965-6213 (Department) Box 872402 965-7671 (Fax) Tempe, Arizona 85287-2402 Email: [email protected] Web pages: http://shesc.asu.edu/kintigh (SHESC Web Page) http://tdar.org; http://digitalantiquity.org (tDAR and Digital Antiquity Pages) http://tfqa.com (Tools for Quantitative Archaeology) EDUCATION, DEGREES, & PROFESSIONAL REGISTRATION PhD Anthropology University of Michigan 1982 MS Computer Science Stanford University 1974 AB Sociology (with honors) Stanford University 1974 RPA Registered Professional Archaeologist 1998 Area of Expertise: Archaeology Theoretical and Methodological Interests: Data Integration, Synthesis, and Digital Archiving; Middle- range Societies; Political Organization; Quantitative Analysis of Archaeological Data; Spatial Analysis Areal Focus: Southwestern U.S. ACADEMIC AND RESEARCH APPOINTMENTS AND HONORS 2014- Co-director, Center for Archaeology and Society, Arizona State University 2010- Senior Sustainability Scientist, Global Institute of Sustainability, Arizona State University 2008- Associate Director, School of Human Evolution & Social Change, Arizona State University 2008-10 Affiliated Faculty, School of Computing and Informatics, Arizona State University 2006-10 Affiliated Faculty, School of Sustainability, Arizona State University 2004 Arizona State University Outstanding Graduate Mentor. Graduate College, Arizona State University 1995- Professor, Department of Anthropology/School of Human Evolution & Social Change, Arizona State University 1987-95 Associate Professor, Department of Anthropology, Arizona State University 1986-95 Research Associate, Pueblo of Zuni, New Mexico. 1986-87 Associate Professor, Department of Anthropology, University of California, Santa Barbara. 1985-86 Assistant Professor, Department of Anthropology, University of California, Santa Barbara. 1980-85 Associate Archaeologist, Arizona State Museum, University of Arizona.
    [Show full text]
  • Crowdsourcing, Citizen Science Or Volunteered Geographic Information? the Current State of Crowdsourced Geographic Information
    International Journal of Geo-Information Article Crowdsourcing, Citizen Science or Volunteered Geographic Information? The Current State of Crowdsourced Geographic Information Linda See 1,*, Peter Mooney 2, Giles Foody 3, Lucy Bastin 4, Alexis Comber 5, Jacinto Estima 6, Steffen Fritz 1, Norman Kerle 7, Bin Jiang 8, Mari Laakso 9, Hai-Ying Liu 10, Grega Milˇcinski 11, Matej Nikšiˇc 12, Marco Painho 6, Andrea P˝odör 13, Ana-Maria Olteanu-Raimond 14 and Martin Rutzinger 15 1 International Institute for Applied Systems Analysis (IIASA), Schlossplatz 1, Laxenburg A2361, Austria; [email protected] 2 Department of Computer Science, Maynooth University, Maynooth W23 F2H6, Ireland; [email protected] 3 School of Geography, University of Nottingham, Nottingham NG7 2RD, UK; [email protected] 4 School of Engineering and Applied Science, Aston University, Birmingham B4 7ET, UK; [email protected] 5 School of Geography, University of Leeds, Leeds LS2 9JT, UK; [email protected] 6 NOVA IMS, Universidade Nova de Lisboa (UNL), 1070-312 Lisboa, Portugal; [email protected] (J.E.); [email protected] (M.P.) 7 Department of Earth Systems Analysis, ITC/University of Twente, Enschede 7500 AE, The Netherlands; [email protected] 8 Faculty of Engineering and Sustainable Development, Division of GIScience, University of Gävle, Gävle 80176, Sweden; [email protected] 9 Finnish Geospatial Research Institute, Kirkkonummi 02430, Finland; mari.laakso@nls.fi 10 Norwegian Institute for Air Research (NILU), Kjeller 2027, Norway; [email protected]
    [Show full text]
  • Using Wordpress to Manage Geocontent and Promote Regional Food Products
    Farm2.0: Using Wordpress to Manage Geocontent and Promote Regional Food Products Amenity Applewhite Farm2.0: Using Wordpress to Manage Geocontent and Promote Regional Food Products Dissertation supervised by Ricardo Quirós PhD Dept. Lenguajes y Sistemas Informaticos Universitat Jaume I, Castellón, Spain Co-supervised by Werner Kuhn, PhD Institute for Geoinformatics Westfälische Wilhelms-Universität, Münster, Germany Miguel Neto, PhD Instituto Superior de Estatística e Gestão da Informação Universidade Nova de Lisboa, Lisbon, Portugal March 2009 Farm2.0: Using Wordpress to Manage Geocontent and Promote Regional Food Products Abstract Recent innovations in geospatial technology have dramatically increased the utility and ubiquity of cartographic interfaces and spatially-referenced content on the web. Capitalizing on these developments, the Farm2.0 system demonstrates an approach to manage user-generated geocontent pertaining to European protected designation of origin (PDO) food products. Wordpress, a popular open-source publishing platform, supplies the framework for a geographic content management system, or GeoCMS, to promote PDO products in the Spanish province of Valencia. The Wordpress platform is modified through a suite of plug-ins and customizations to create an extensible application that could be easily deployed in other regions and administrated cooperatively by distributed regulatory councils. Content, either regional recipes or map locations for vendors and farms, is available for syndication as a GeoRSS feed and aggregated with outside feeds in a dynamic web map. To Dad, Thanks for being 2TUF: MTLI 4 EVA. Acknowledgements Without encouragement from Dr. Emilio Camahort, I never would have had the confidence to ensure my thesis handled the topics I was most passionate about studying - sustainable agriculture and web mapping.
    [Show full text]
  • Review of Web Mapping: Eras, Trends and Directions
    International Journal of Geo-Information Review Review of Web Mapping: Eras, Trends and Directions Bert Veenendaal 1,*, Maria Antonia Brovelli 2 ID and Songnian Li 3 ID 1 Department of Spatial Sciences, Curtin University, GPO Box U1987, Perth 6845, Australia 2 Department of Civil and Environmental Engineering (DICA), Politecnico di Milano, P.zza Leonardo da Vinci 32, 20133 Milan, Italy; [email protected] 3 Department of Civil Engineering, Ryerson University, 350 Victoria Street, Toronto, ON M5B 2K3, Canada; [email protected] * Correspondence: [email protected]; Tel.: +618-9266-7701 Received: 28 July 2017; Accepted: 16 October 2017; Published: 21 October 2017 Abstract: Web mapping and the use of geospatial information online have evolved rapidly over the past few decades. Almost everyone in the world uses mapping information, whether or not one realizes it. Almost every mobile phone now has location services and every event and object on the earth has a location. The use of this geospatial location data has expanded rapidly, thanks to the development of the Internet. Huge volumes of geospatial data are available and daily being captured online, and are used in web applications and maps for viewing, analysis, modeling and simulation. This paper reviews the developments of web mapping from the first static online map images to the current highly interactive, multi-sourced web mapping services that have been increasingly moved to cloud computing platforms. The whole environment of web mapping captures the integration and interaction between three components found online, namely, geospatial information, people and functionality. In this paper, the trends and interactions among these components are identified and reviewed in relation to the technology developments.
    [Show full text]
  • Volunteered and Crowdsourced Geographic Information: the Openstreetmap Project
    JOURNAL OF SPATIAL INFORMATION SCIENCE Number 20 (2020), pp. 65–70 doi:10.5311/JOSIS.2020.20.659 INVITED ARTICLE Volunteered and crowdsourced geographic information: the OpenStreetMap project Michela Bertolotto1, Gavin McArdle1, and Bianca Schoen-Phelan2 1School of Computer Science, University College Dublin 2School of Computer Science, Technological University Dublin Received: February 28, 2020; accepted: May 12, 2020 Abstract: Advancements in technology over the last two decades have changed how spa- tial data are created and used. In particular, in the last decade, volunteered geographic information (VGI), i.e., the crowdsourcing of geographic information, has revolutionized the spatial domain by shifting the map-making process from the hands of experts to those of any willing contributor. Started in 2004, OpenStreetMap (OSM) is the pinnacle of VGI due to the large number of volunteers involved and the volume of spatial data generated. While the original objective of OSM was to create a free map of the world, its uses have shown how the potential of such an initiative goes well beyond map-making: ranging from projects such as the Humanitarian OpenStreetMap (HOT) project, that understands itself as a bridge between the OSM community and humanitarian responders, to collaborative projects such as Mapillary, where citizens take street-level images and the system aims to automate mapping. A common trend among these projects using OSM is the fact that the community dynamic tends to create spin-off projects. Currently, we see a drive towards projects that support sustainability goals using OSM. We discuss some such applications and highlight challenges posed by this new paradigm.
    [Show full text]
  • Preserving Transactional Data and the Accompanying Challenges Facing Companies and Institutions That Aim to Re-Use These Data for Analysis Or Research
    01000100 01010000 Preserving 01000011 Transactional 01000100 Data 01010000 Sara Day Thomson 01000011 01000100 DPC Technology Watch Report 16-02 May 2016 01010000 01000011 01000100 01010000 Series editors on behalf of the DPC Charles Beagrie Ltd. 01000011 Principal Investigator for the Series Neil Beagrie 01000100 01010000 This report was supported by the Economic and Social Research Council [grant number ES/JO23477/1] 01000011 © Digital Preservation Coalition 2016 and Sara Day Thomson 2016 Contributing Authors for Section 9 Technical Solutions: Preserving Databases Bruno Ferreira, Miguel Ferreira, and Luís Faria, KEEP SOLUTIONS and José Carlos Ramalho, University of Minho ISSN: 2048-7916 DOI: http://dx.doi.org/10.7207/twr16-02 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without prior permission in writing from the publisher. The moral rights of the author have been asserted. First published in Great Britain in 2016. Foreword The Digital Preservation Coalition (DPC) is an advocate and catalyst for digital preservation, ensuring our members can deliver resilient long-term access to digital content and services. It is a not-for-profit membership organization whose primary objective is to raise awareness of the importance of the preservation of digital material and the attendant strategic, cultural and technological issues. It supports its members through knowledge exchange, capacity building, assurance, advocacy and partnership. The DPC’s vision is to make our digital memory accessible tomorrow. The DPC Technology Watch Reports identify, delineate, monitor and address topics that have a major bearing on ensuring our collected digital memory will be available tomorrow.
    [Show full text]
  • Table of Contents in Alphabetical Order
    Table of Contents in Alphabetical Order Volume I - pp. 1-797 ∙ Volume II - pp. 798-1599 ∙ Volume III - pp. 1600-2370 ∙ Volume IV - pp. 2371-3135 ∙ Volume V - pp. 3136-3911 Volume VI - pp. 3912-4689 ∙ Volume VII - pp. 4690-5429 ∙ Volume VIII - pp. 5430-6185 ∙ Volume IX - pp. 6186-6955 ∙ Volume X - pp. 6956-7701 Preface........................................................................................................................................... clxxxix Guide to the Encyclopedia of Information Science and Technology, Third Edition.................cxciv Acknowledgment..............................................................................................................................cxcvi About the Editor............................................................................................................................ cxcvii 3D.Media.Architecture.Communication.with.SketchUp.to.Support.Design.for.Learning................ 2410 Michael Vallance, Department of Media Architecture, Future University Hakodate, Japan A.Bayesian.Network.Model.for.Probability.Estimation.................................................................... 1551 Harleen Kaur, Hamdard University, India Ritu Chauhan, Amity University, India Siri Krishan Wasan, Jamia Millia Islamia, India A.Brief.Review.of.the.Kernel.and.the.Various.Distributions.of.Linux............................................. 4018 Jurgen Mone, Business College of Athens, Greece Ioannis Makris, Business College of Athens, Greece Vaios Koumaras, Business College of Athens,
    [Show full text]