<<

World Bank Knowledge for Change Program – Full Proposal Template

Basic Data:

Title OpenStreetMap and the World Bank: teaching, learning, and improving

Linked Project ID P161480 Product Line RA

Applied Amount ($) 100 K Est. Project Period 01/01/2020 – 30/06/2021 Team Leader(s) Benjamin P. Stewart Managing Unit DECAT

Contributing unit(s)

Funding Window Innovation in Data Production, Analysis and Dissemination

Regions/Countries World

General:

1. What is the Development Objective (or main objective) of this Grant?

Data are increasingly important to how World Bank staff plan, design, implement, and monitor projects. Despite this, WB staff are often poor stewards of global public repositories of information, an issue which should be addressed. Spatial data on roads, settlements, buildings, population density, etc are fundamental to all development projects. The World Bank’s Geospatial Operational Support Team (GOST) is mandated to improve the WB’s efforts to systematically collect and maintain these data. GOST aims to do this by also encouraging and supporting our clients in the design, collection, and maintenance of such data; this necessitates a globally accessible and available repository.

GOST, therefore proposes adopting OpenStreetMap (OSM) as our official repository for road features, building footprints, settlements, and other OSM appropriate features. We further propose that we invest in a sustainable means of improving the OSM, of leveraging OSM, and of teaching OSM to World Bank staff and clients.

While OSM is a global public good by itself, there are inherent biases in the coverage, completeness, and accuracy of the repository; biases of which we are not currently aware. We hypothesize that the OSM repository will be less complete in rural areas of high poverty (Anderson et al 2019). This is important because OSM is used as a foundational dataset in numerous analytical frameworks, and if it under-represents the poorest, these populations will be mis-represented. Additionally, we know that female participation in contributions to OSM is quite low (Schmidt and Klettner 2013, Yang et al. 2019), and this introduces bias in the repository. By re-designing how we deliver OSM capacity building efforts, we hope to encourage more female participation, and identify areas of bias.

OSM represents an opportunity for the World Bank to be a leader in open data, transparency in analytics, and project monitoring and evaluation. This proposal aims to coordinate the World Bank efforts around OSM, improve our understanding of OSM’s inherent flaws, and centralize our capacity building efforts.

2. Summary description of Grant financed activities

This project will have three separate tasks focusing on better understanding OSM, and better contributing to it:

Page 1 of 10 World Bank Knowledge for Change Program – Full Proposal Template

1. Design a methodology that assesses the quality and completeness of OSM; apply that methodology to three countries, at least one of which will be a country classified as FCV (Fragility, Conflict, and Violence affected). a. Compare these OSM completeness maps to a series of foundational World Bank datasets; notably locations of World Bank Group projects, and recent maps of poverty. Comparisons will be tested through a suite of map comparison methodologies (Hagen-Zanker, 2008). b. Based our knowledge of the completeness of OSM, identify how the incompleteness of OSM affects our catalog of OSM tools. 2. Catalog analytical tools designed to run against OSM a. As a consistent global repository of data, numerous tools have been developed against OSM that should be portable to various geographies i. E.g. GOSTNets, a tool developed by GOST to improve accessibility mapping using open data b. The tool catalog will be used to identify holes in the World Bank analytical offerings. 3. Standardize, organize, and coordinate geospatial training for clients using OSM a. GOST is already coordinating ongoing geospatial training happening in the Energy unit around the use of geospatial planning tools (P164156), a new effort led by GOST to improve geospatial information uptake in fragile countries (P170441), and World Bank support to census (P118858)

3. What are the main risks related to the Grant financed activity? Are there any potential conflicts of interest for the Bank? How will these risks/conflicts be monitored and managed? The risks from this activity are minimal – the OpenStreetMap repository is open and collaborative, and the World Bank is going to continue to utilize and support OSM regardless of the output of this activity.

Looking at OSM from a larger perspective, there are potential risks from the platform, notably vandalism (Ballatore, 2014) and validation (Wroclawski 2018). These both relate to the issue of validation and accuracy in an open data project. While this is a concern with addressing in the activity itself (how do we validate and ensure accuracy in the maps?), it is not a fundamental concern for how the activity is implemented.

4. (Optional question) What can/has been done to find an alternative source of financing, i.e. instead of a Bank administered Grant?

The Geospatial Operational Support Team (GOST) have already committed resources to supporting this initiative (detailed in the associated budget document). Additionally, this initiative is meant to coordinate and integrate with several existing initiatives which GOST support:

a Geospatial training through the Energy unit as part of the ongoing Geospatial Electrification Platform (GEP) – P164156

b GOST are leading an effort to improve the use of geospatial tools in fragile countries – P170441

KCPIII Specific:

1. How does (do) the objective(s) of this proposal align with the World Bank Group’s twin goals? What are the key thematic research questions being addressed in this research?

The purpose of this project is multi-fold:

a To improve our client countries’ capacity to create, manage, and use geospatial information – in the end, this will reduce costs related to data collection, management, and analysis. The World Bank has delivered OSM training as part of capacity building efforts in numerous projects and in numerous regions, but the efforts have been sporadic. By centralizing and coordinating our capacity building efforts, we hope

Page 2 of 10 World Bank Knowledge for Change Program – Full Proposal Template

to improve participation and reduce costs for such training programs, and better tailor our training to client needs.

b To improve the World Bank’s understanding of OSM and its limitations. OSM is an exemplary open, big, geospatial dataset that has global coverage, leverages crowd sourcing, and often out performs similar commercial datasets. Despite the widespread usage of OSM, we rarely interrogate its flaws and biases, despite the established literature highlighting the heterogeneity of OSM edits (Anderson et al. 2019, Anwar 2018). Accounting for the biases in the data should allow us to better tailor our tools and analyses to better serve our clients.

c To standardize delivery of information and tools related to OSM to both World Bank staff and out clients. There are numerous tools based off OSM designed to increase our knowledge of climate (Lao et al. 2018, World Bank 2014), poverty (Oshri et al. 2018), and female participation in STEM (Schmidt and Klettner, 2013). These methodologies are all based on OSM and should therefore be transferrable to other regions and projects. By standardizing how we educate people about OSM, and use OSM in our projects, we will be able to open entire regions and sectors to the usefulness of OSM.

2. Describe analytic design & methodology. Elaborate on hypotheses, conceptual framework, data (survey design if applicable).

The first task is to develop methods to assess the completeness of OSM and address two questions 1. How does incomplete OSM data affect spatial analysis? a. We will look at the standard OSM tools we currently use, and see how they are affected by systematic removal of information from the dataset. 2. Where is OSM incomplete? Is there a pattern across countries as to the completeness of OSM? a. The hypothesis is that OSM is not homogenously incomplete but will follow patterns of development i.e. – poor, rural areas will be less complete. However, areas prone to disaster may buck this trend, as the disaster response community often works very hard to update the OSM database in response to disasters. b. The analytical approach will vary depending on the results of the OSM completeness assessments. However, the basic process will be: i. Generate a map of OSM completeness at a country-scale ii. Compare OSM completeness map to maps of poverty, conflict, etc. using standard map- comparison methodologies (Hagen-Zanker, Martens 2008).

Most of the data will be derived from OSM directly, but we will also use several existing data sources with the World Bank:

1. Building footprints: The World Bank has purchased building footprints for several countries from vendors specializing in machine learning. These will serve as a comparator to the OSM database. 2. Poverty data: we will compare the OSM maps to the best available poverty data from the World Bank. 3. Conflict data: we will map conflict using a few reputable open repositories and compare to our OSM completeness assessment.

3. Provide a literature review & explain study’s intellectual merit. An annotated bibliography is attached at the bottom of this proposal and is referenced in some of the questions herein. As evidenced through the bibliography, there is substantial interest in understanding the question of completeness in OSM. As well, there are numerous examples of OSM being used in studies on poverty, gender, and climate, all of which leverage OSM, but few address the question of completeness or bias in the underlying data.

4. Describe Implementation arrangements. Identify timeline, key team members and their roles. If the partnership is involved, describe the partnership arrangements, and the respective responsibility of Bank units and partners.

Page 3 of 10 World Bank Knowledge for Change Program – Full Proposal Template

TASK TEAM Members Complete By

Assess OSM Benjamin Stewart (Project Lead) End of March 2020 Completeness Development Seed (Firm) – Will develop and implement methods for assessing OSM completeness.

Catalogue Benjamin Stewart (Project Lead) End of June 2020 OSM tools Andres Chamarro (GOST ETC) – Will develop catalog of tools and improve existing repositories of tools

OSM training Benjamin Stewart (Project Lead) – Will tie this into existing End of June 2021 development capacity building program in energy (P164156) and coordination Katie McWilliams (GOST staff) – Will develop training program as a tie in to her existing program (P170441)

Thomas Gertin (GOST STC) – develop a platform for disseminating OSM training

Maggie Cawley (Consultant, Executive Director OSM US) – Has delivered training in OSM for the Bank on multiple occasions, and is now Executive Director of OSM US

5. Outline the expected outputs (working paper, publication, computational/analytical tools, datasets, etc.) and specify the expected date of delivery for each output.

TASK Output Complete By

Assess OSM 1. OSM completeness methodology paper and code. End of June 2020 Completeness 2. OSM completeness datasets for at lest 3 countries, one of which will be a fragile, conflict and violence affected state (FCV). Note that while the focus is on FCV countries, we want to include countries at various levels of development for comparison

Catalogue 1. Catalog of tools for leveraging OSM with a focus on WB staff End of June 2020 OSM tools and partners. 2. Output/dissemination tools will be determined later depending on the size and complication of the catalog

OSM training 1. Guideline paper for World Bank staff and partner NGOs on End of June 2021 development how to deliver OSM training: when is it appropriate, how and much does it cost, etc. coordination

Page 4 of 10 World Bank Knowledge for Change Program – Full Proposal Template

2. Hopefully, we will deliver one OSM training following the guideline paper, but that will depend on demand and funding

6. Describe the beneficiary of the research, the relevance for policy in developing (or transition) countries and for WBG Operations. Outline dissemination plans, including plans to reach policy makers.

The beneficiaries of this activity fall into two categories: World Bank staff and client country counterparts.

For World Bank staff, we hope to provide tools and advice as to the usefulness of OSM, and how it can be applied to problems facing developing countries. We believe this will be of increased us to our country office counterparts, who are often interfacing directly with our partners in government, and who need to be better equipped to discuss the merits of open data. The final reports and tools should make it easier for World Bank staff to leverage OSM data in their initiatives, and to include capacity building around OSM in their projects.

For our counterparts in government, we hope to provide access to data and tools that will improve project design and implementation. Understanding the benefits and limitations of open data repositories is fundamental in understanding how to best invest in new data and implement tools for improving efficiency in service delivery.

7. Describe the capacity building components, including the collaboration with local partners, researchers from developing countries.

Capacity building is one of the fundamental pillars of this activity. The goal is to develop a plan for World Bank staff to include training on OSM (and open data) in their projects, and to deliver a single OSM training in conjunction with an ongoing project.

Concerning collaboration, the GOST team have previously worked with a team (Hot OSM) in Dar es Salaam, Tanzania, and we would hope to include them in discussions, research, and implementation. They have specialized in on-the- ground validation of OSM digitized features, and their opinions in both implementing training programs and identifying features of interest will prove invaluable.

8. Document evidence of the consultation process with relevant research and operations units. E.g. consultation conducted, comments received, & how comments were addressed. TTLs should also describe plans to maintain operational and research consultation.

The consultation process is essentially the culmination of years of GOST supporting operations using OSM. The reason the GOST team are proposing this as a separate activity is that we need to consolidate our efforts, develop a strategic capacity building program, and move forward in our understanding of the completeness and inherent biases contained in OSM. GOST has supported geospatial training that included or focused on OSM in several projects in Nepal (P150328), St Lucia (P161942), St. Vincent (P163223), South Africa (P164156), and Italy (P164156) to name a few

The GOST team and the larger development community will continue to rely on OSM as an open repository of important geospatial information, regardless of this activity. The goal here is to develop a better understanding of the data, the tools, and the potential of the platform moving forward.

Disbursement Projection

From Date i To Date ii Amount February 2020 June 2020 58500

Page 5 of 10 World Bank Knowledge for Change Program – Full Proposal Template

September 2020 December 2020 42500 January 2021 July 2021 0

BUDGET – see attached spreadsheet

Page 6 of 10 World Bank Knowledge for Change Program – Full Proposal Template

Annotated Bibliography

OSM Completeness Anderson, J., Sarkar, D., & Palen, L. (2019). Corporate Editors in the Evolving Landscape of OpenStreetMap. ISPRS International Journal of Geo-Information, 8(5), 232. https://doi.org/10.3390/ijgi8050232

The authors examine the influence that corporate editors (Apple, Microsoft, , etc.) are having on the geographic representation of OpenStreetMap. They analyze historical quarterly snapshots of OSM-QA-Tiles to show where corporate editors are mapping and what types of features they are prioritizing. Mapbox and Apple have the largest footprints with edits on all six populated continents. Overall, they find that corporate editing is a global phenomenon with specific regions of more interest to some companies than others. While corporations currently have a major impact on road networks, non- corporate mappers edit more buildings and points-of-interest.

Anwar, S. (2018, June 12). Map Completeness and OSM Analytics. Retrieved November 4, 2019, from Medium website: https://medium.com/devseed/map-completeness-and-osm-analytics-83d6e0f3d969

Development Seed, the Humanitarian OpenStreetMap Team, and Azavea developed a map completeness estimation model for building data in OpenStreetMap, by correlating the building data with gridded population (WorldPop). The machine learning model is trained using rural and urban areas that are identified with good quality residential buildings in OSM. The model helps to assess how well OSM coverage is based on what population density would imply. This model has been featured in the HOT Analytics for Health campaign, a project focused on assessing data quality for health initiatives.

Anwar, S. (2019, Nov 12). Further and Faster Together: the Future of OSM. Retrieved November 12, 2019, from Medium website: https://medium.com/devseed/further-and-faster-together-the-future-of-osm-bbcec6cb8f0d

In this blog, the author highlights the increasing role of artificial intelligence to facilitate the validation of OSM data. He highlights four main trends in the use of OSM: (1) As more of the world is mapped in OSM, a higher percentage of the work will be verifying and improving the map. (2) The improving accuracy and utility of AI will empower mappers and validators to make the map better, faster. (3) Organized efforts and institutions will have a greater interest and role in systematically improving and maintaining the map. (4) Imperatives for accurate data will lead to bigger investments in verification and change detection.

Barron, C., Neis, P., & Zipf, A. (2013). A Comprehensive Framework for Intrinsic OpenStreetMap Quality Analysis. Transactions in GIS, 18(6), 877–895. https://doi.org/10.1111/tgis.12073

In most scientific studies, OSM completeness is assessed by comparing the data with commercial or administrative datasets. However, replicating this is not always feasible due to lack of data availability, licensing agreements, or procurement costs. The authors propose 25 methods and indicators to assess OSM quality based solely on the data’s history. This enables repeatable intrinsic OSM quality analyses for any part of the world.

Humanitarian OpenStreetMap Team. (2016). OSM Analytics Tool. Retrieved November 4, 2019, from https://osm- analytics.org/#/gaps

The Humanitarian OSM team has built a set of analytical tools for analyzing gaps in OSM data – areas where infrastructure has not been mapped, and areas that have been mapped but do not have any useful attributes. By comparing OpenStreetMap feature densities with certain external datasets potential gaps in OSM can be identified: The Gap Detection tab of osm-analytics shows building counts that are matched with the amount of built-up area (derived using remote sensing methods) to help find areas with potentially missing building data in OSM. Additionally, the team has developed a methodology to assess attribute completeness in OSM data, which is particularly important for the team’s work on health (HOT Analytics for Health). For Malaria campaigns, it's important to identify residential structures on the map with use of the building, roof and wall type. The proposed analysis uses OSMLint to find buildings that do not have “roof” or “wall” attributes. These numbers inform the state of completeness in OSM building data.

Page 7 of 10 World Bank Knowledge for Change Program – Full Proposal Template

Humanitarian OpenStreetMap Team. (n.d.). Making a New MapCampaigner. Retrieved November 4, 2019, from https://www.hotosm.org/updates/making-a-new-mapcampaigner/

Planning and conducting field data collection projects is not an easy task. It requires organizations to assess, prepare, and then monitor using a wide variety of tools across each step. The MapCampaigner tool helps monitor the quality of data going into OSM.

OSM Gender Schmidt, ., and Klettner, S. (2013). Gender and Experience-Related Motivators for Contributing to OSM. http://flrec.ifas.ufl.edu/geomatics/agile2013/presentations/ACTIVITY_WS_AGILE_2013_SESSION_1_Schmidt.pdf

Studies on the demographics of OSM contributors reveal that the typical OSM contributor is male, well-educated and technology-savvy. This not only leads to less contributors and thereby less collected data but might also affect the quality of the data. This paper presents results from an online survey with 516 participants with different levels of experience with OSM. The authors focus on participants, who are not actively contributing to OSM at the moment and explore, which aspects would motivate them to contribute to OSM (again). Mapping for a dedicated purpose and less time-consuming mapping solutions are the most prevalent motivators; however, responses differ according to OSM experience and gender.

Yang, S., Jacquin C., and Gonzalezm M. (2019, May 28). Geochicas: Helping Women Find their Place on the Map. Retrieved November 8, 2019, from The Blog website: https://blog.mapillary.com/update/2019/05/28/putting-women-on-the- map-with-geochicas.html

From over 4,000,000 collaborators in OSM, only 2-5% are women. This issue of representation in the mapping community may create biases in the information that is mapped. Geochicas is a network for women in OpenStreetMap to address this issue. They have over 200 participants representing more than 22 countries worldwide. This initiative started during the State of the Map Latam conference in Sao Paulo, Brazil in 2016 and they focus on increasing the participation of women in the geospatial and technology spaces. They also analyze how women are represented in such spaces and the role of women in terms of the decision-making structures of the mapping community.

OSM Climate Lao, J., Bocher, E., Petit, G., Palominos, S., Le Saux, E., & Masson, V. (2018). Is OpenStreetMap suitable for urban climate studies?

OpenStreetMap data was used for urban climate studies that assess the impact of cities on regional climate. This paper describes a methodology and a set of tools to check the availability of the OSM data to feed the MApUCE geoprocessing chain – creating a set of morphological architectural and socioeconomic indicators - and thus urban climate studies. The authors propose an open source framework to: query on the fly the OSM database from a country code, compute spatial and attribute metrics on the country, store the results on a multi-dimensional database, and visualise the results from a dashboard service that integrates chart and map representations at different scales: time, attributes, geography.

Rahman, K. M., Alam, T., & Chowdhury, M. (2012). Location based early disaster warning and evacuation system on mobile phones using OpenStreetMap. 2012 IEEE Conference on Open Systems. https://doi.org/10.1109/icos.2012.6417627

This article provides an example of using OpenStreetMap to design effective disaster management applications. The authors used OSM to create an early disaster warning and evacuation system for both normal and blind people: The system is implemented on android mobile phone because of the burgeoning growth of smart phones in Bangladesh. So, our system comprises a third-party server named Disaster Management Server (DMS), android device with our application installed on it and user. The local weather office updates the disaster data on DMS. Device user registers on Android Cloud to Device Messaging (C2DM) server to get automatic notification of upcoming disaster otherwise user gets manual notification. The user communicates with DMS to have updated data sending the current position obtained by GPS or network provider. The probable disaster affected area is determined by ray casting algorithm. When our application recognizes the user in

Page 8 of 10 World Bank Knowledge for Change Program – Full Proposal Template

probable disaster zone then application will disseminate visual and audio disaster warning and evacuation guideline including shortest path of shelter or safe zone on the map of the application.

World Bank. (2014). Open Data for Resilience Field Guide. Washington, DC: World Bank. License: Creative Commons Attribution CC BY 3.0.

The Open Data for Resilience Initiative (OpenDRI) is a growing partnership to improve governments’ technical capabilities to collect data, analyze, and adjust project design ahead of the disaster cycle. This report highlights OpenStreetMap as a key data source, with potential to help in the following areas: (1) Impact Modeling: Using the data from community mapping, hazard models, and open government data (demographics, inafrastructure, etc), it is possible to use impact modeling tools to visualize the likely impact of a hazard on schools, hospitals, major population centers, and other scenarios. (2) Damage Loss Calculation. Based on the attributes collected by community mapping (building types, use, footprint, materials, etc), it is possible to calculate the approximate value of hundreds of thousands of individual structures in an urban area. In aggregate, these calculations provide a higher-resolution view into potential losses than estimations. (3) Rubble Calculation. Similar to the data for damages, community mapping provides planner with a better understanding of the potential volume of rubble that may occur after an earthquake. USAID is using data from Nepal for this purpose.

OSM Poverty Fisker, P., Malmgren-Hansen, D., and Sohnesen, T. (2019). Using Satellite Data to Support the Expansion of the Urban Safety Net in Mozambique. Presentation at the World Bank Group – October 1st, 2019.

The authors trained a Convolutional Neural Network to predict the average Poverty Score (PMT) in small cells based on building features derived from satellite imagery (type and quality), as well as road features (availability and density) based on OSM data. This work shows that OSM data has become an increasingly important data source in machine learning efforts to predict poverty and infrastructure quality.

Mahabir, R., Croitoru, A., Crooks, A., Agouris, P., & Stefanidis, A. (2018). A Critical Review of High and Very High-Resolution Remote Sensing Approaches for Detecting and Mapping Slums: Trends, Challenges and Emerging Opportunities. Urban Science, 2(1), 8. https://doi.org/10.3390/urbansci2010008

Slums are a global urban challenge, with less developed countries being particularly impacted. To adequately detect and map them, data is needed on their location, spatial extent and evolution. OpenStreetMap is an example of volunteered geographic information (VGI) and Map Kibera is a prototypical example of crowdsourcing slum data.

Oshri, B., Hu, A., Adelson, P., Chen, X., Dupas, P., Weinstein, J., … Ermon, S. (2018). Infrastructure Quality Assessment in Africa using Satellite Imagery and Deep Learning. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining - KDD ’18. https://doi.org/10.1145/3219819.3219924

The authors investigate different models and remote sensing data sources to predict infrastructure quality in Africa, using ground truth labels from the Afrobarometer Round 6 survey. They find that models using Landsat 8 imagery perform better than models that leverage OpenStreetMap or nighttime light intensity on the same tasks.

OSM Security Risk Ballatore, A. (2014). Defacing the map: Cartographic vandalism in the digital commons. The Cartographic Journal, 51(3), 214–224.

This article addresses the emergent phenomenon of carto-vandalism, the intentional defacement of collaborative cartographic digital artifacts in the context of volunteered geographic information. Through a qualitative analysis of reported incidents in WikiMapia and OpenStreetMap, a typology of this kind of vandalism is outlined, including play, ideological, fantasy, artistic, and industrial carto-vandalism, as well as carto-spam. Two families of counter-strategies deployed in amateur mapping communities are discussed. First, the contributors organize forms of policing, based on volunteered community involvement, patrolling the maps and reporting incidents. Second, the detection of carto- vandalism can be supported by automated tools, based either on explicit rules or on machine learning.

Page 9 of 10 World Bank Knowledge for Change Program – Full Proposal Template

Wroclawski, S. (2018 February 16), Why OpenStreetMap is in Serious Trouble. Retrieved November 12, 2019, from personal blog: https://blog.emacsen.net/blog/2018/02/16/osm-is-in-trouble/

In this blog, a former collaborator of OSM highlights potential risks associated with the open-source nature of the project. Most importantly, he notes that there is no robust framework to review or validate new features before they are added to the map.

Other References Hagen-Zanker A., Martens P. (2008) Map Comparison Methods for Comprehensive Assessment of Geosimulation Models. In: Gervasi O., Murgante B., Laganà A., Taniar D., Mun Y., Gavrilova M.L. (eds) Computational Science and Its Applications – ICCSA 2008. ICCSA 2008. Lecture Notes in Computer Science, vol 5072. Springer, Berlin, Heidelberg

Page 10 of 10