Openstreetmap History for Intrinsic Quality Assessment: Is OSM Up-To-Date? Marco Minghini1,3* and Francesco Frassinelli2,3

Openstreetmap History for Intrinsic Quality Assessment: Is OSM Up-To-Date? Marco Minghini1,3* and Francesco Frassinelli2,3

Minghini and Frassinelli Open Geospatial Data, Software and Standards (2019) 4:9 Open Geospatial Data, https://doi.org/10.1186/s40965-019-0067-x Software and Standards SOFTWARE Open Access OpenStreetMap history for intrinsic quality assessment: Is OSM up-to-date? Marco Minghini1,3* and Francesco Frassinelli2,3 Abstract OpenStreetMap (OSM) is a well-known crowdsourcing project which aims to create a geospatial database of the whole world. Intrinsic approaches based on the analysis of the history of data, i.e. its evolution over time, have become an established way to assess OSM quality. After a comprehensive review of scientific as well as software applications focused on the visualization, analysis and processing of OSM history, the paper presents “Is OSM up-to-date?”, an open source web application addressing the need of OSM contributors, community leaders and researchers to quickly assess OSM intrinsic quality based on the object history for any specific region. The software, mainly written in Python, can be also run in the command line or inside a Docker container. The technical architecture, sample applications and future developments of the software are also presented in the paper. Keywords: Crowdsourcing, Data history, Data quality, Open source, OpenStreetMap, up-to-dateness Introduction of software tools, services and applications allows a wide OpenStreetMap (OSM) is the most successful crowd- number of developers, humanitarian operators, industry sourced geographic information project to date [1]. It was and governmental actors to exploit OSM data on a daily initiated in 2004 in response to the mainstream pres- basis and for a variety of purposes [6]. ence of legal or technical restrictions on available maps, Being a potentially useful source of geospatial infor- with the ultimate goal of creating and distributing free mation for many disciplines, OSM has attracted as well geographic data for the whole world [2]. As a matter of an increasing interest from the academic and scientific fact, the OSM geospatial database is distributed under the community [7]. Not surprisingly, the topic which so far open access Open Database License (ODbL) [3], allowing has been most investigated by researchers is OSM quality freely using, modifying and building upon the database assessment [8], since crowdsourced geographic informa- under non-restrictive conditions such as providing attri- tion suffers by definition from a general lack of qual- bution to the OSM contributors. The relatively easy way ity assurance [9]. OSM quality has been traditionally - even for people with no background in geography or assessed using the standard quality parameters available computer science - to create OSM data has attracted so far for geospatial datasets, e.g. positional accuracy, com- an ever increasing number of contributors to the project. pleteness, logical consistency, thematic accuracy, tempo- At the time of writing (June 2019), the project counts ral accuracy, lineage, up-to-dateness, and fitness-for-use about 5.5 million registered users [4], while the num- [9–12]. The latter suggests that quality should not be ber of contributors, i.e. the users who have made at least measured in absolute terms, since crowdsourced datasets one edit to the database, has exceeded the threshold of such as OSM may have different degrees of suitability for 1 million only in March 2018 [5]. In turn, a rich ecosystem specific purposes and users’ demands. In the case of OSM these quality parameters have *Correspondence: [email protected] been traditionally measured through extrinsic quality The views expressed are purely those of the authors and may not in any circumstances be regarded as stating an official position of the European approaches, i.e. by comparing OSM against external ref- Commission. erence datasets considered as the ground truth, such 1European Commission, Joint Research Centre (JRC), 21027 Ispra, Italy as those provided by national mapping agencies and 3Department of Civil and Environmental Engineering, Politecnico di Milano, 20133 Milano, Italy commercial mapping companies. The quality parameters Full list of author information is available at the end of the article © The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Minghini and Frassinelli Open Geospatial Data, Software and Standards (2019) 4:9 Page 2 of 17 which have been most investigated through extrin- them, namely nodes, ways and relations, include a geo- sic approaches are positional accuracy, completeness metric component. A node, defined by a latitude and a and thematic accuracy, with focus mainly placed - in longitude, is used to represent standalone point features descending order - on OSM roads [13–17], buildings such as trees, traffic lights or points of interests, and con- [18–21], land use [22–24] and points of interest [25, 26]. sists of geographic coordinates (latitude and longitude). Overall, the available literature agrees that OSM quality A way is an ordered list of between 2 and 2000 node shows very heterogeneous patterns across space, ranging references, which can represent both linear features (poly- from areas where it favourably compares with authorita- lines) such as roads and rivers, and areal features (poly- tive datasets (typically the most urbanized areas) to areas gons) such as buildings and land use areas. The limit where data is either missing or of poor quality. of 2000 nodes per way was established in 2009 with However, OSM and authoritative datasets are extremely the changes from version 0.5 to 0.6 of the OSM API different by nature, e.g. in terms of their production and [28], therefore ways created earlier might reference any update processes which often lead OSM to be clearly number of nodes. A relation is a special structure used more detailed, accurate and complete than authoritative to represent polylines or polygons of more than 2000 datasets, thus violating the basic hypothesis of using the nodes and relationships between zero or more nodes, latter as ground truth [27]. In addition, in many parts ways and other relations [29, 30]. An example of relation of the world authoritative datasets are either missing or is a bus route, which links the ways of the roads trav- not suitable for comparison (e.g. because their scale is eled by the bus and the nodes of the bus stops. Another too course and not comparable to OSM). For these rea- common example is a polygon with holes, e.g. a build- sons, OSM intrinsic assessment methods have begun to ing with an inner courtyard, which links the outer way appear with the goal of determining OSM quality (mainly of the building outline and the inner way of the court- meant as fitness-for-use) by only looking at the temporal yard. Each single OSM node, way and relation has an evolution of OSM itself, i.e. without comparison against id number that uniquely identifies it. For example, the third-party datasets. Since each edit to the database is also node accessible at https://www.openstreetmap. stored together with the database itself, the whole history org/node/5207201516 (where 5207201516 is the of OSM is in fact available and provides an extremely rich node id) currently represents the Notting Hill subway sta- data source for a variety of purposes, including quality tion in London; similarly, the way accessible at https:// assessment. Accordingly, many software implementations www.openstreetmap.org/way/32965412,(where have been developed, which are based on the analysis of 32965412 is the way id) currently represents the OSM history. The software “Is OSM up-to-date?”,which is Statue of Liberty in New York; and the relation described in this paper, contributes to this flourishing set accessible at https://www.openstreetmap.org/ of software applications with the main goal of providing relation/7515426 (where 7515426 is the relation intrinsic quality measures for single OSM objects. id) currently represents the Louvre Museum in Paris. It is The remainder of the paper is organized as follows. worth clarifying that the ids of nodes, ways and relations “OSM data model and OSM history” section provides cannot be considered as permanent ids of real-world fea- some background information on the OSM data model and tures [31]. In other words, a real-world feature which is the OSM history. “Applications based on OSM history” currently represented by e.g. a node might in the future section revises the literature on OSM intrinsic assess- be represented by another node or by a way or a rela- ment through the analysis of both scientific applications tion. As an example, the Louvre Museum was in the and available software implementations based on OSM past represented by another relation with id 3262297 history. The latter allow to classify the multitude of exist- and might be represented by new relations in the future. ing applications and to better position the software “Is This fundamental difference between the permanent ids OSM up-to-date?”, whose rationale and architecture are of real-world features and the OSM ids of nodes, ways and described in detail in “Implementation: Is OSM up-to- relations should be properly considered when analyzing date?”section.“Application” section presents the two pos- OSM history data. sible applications of the software, i.e. via the web interface Tags are the fourth data type of the OSM data model and the command line interface. “Conclusions and future (see Fig. 1) and specify the attributes of real-world features development” section concludes the paper by summariz- represented by nodes, ways and relations.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    17 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us