INVESTIGATING USER CONTRIBUTION PATTERNS TO NOVEL VOLUNTEERED GEOGRAPHIC INFORMATION PLATFORMS

By

LEVENTE JUHÁSZ

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2018

© 2018 Levente Juhász

To all who believed in me

ACKNOWLEDGMENTS

First and foremost, my biggest gratitude goes to my advisor, Hartwig Hochmair, for guiding me through this challenging endeavor. His uttermost patience, understanding, professionalism and the opportunities he provided were all essential for the successful completion of my program. I am truly grateful for this.

I also thank wholehearteadly all labmates, colleagues, fellow scientists, anonymous peer-reviewers, committee members, students and everyone I interacted with in any form during the past few years. Their figures stand in front of me, as either inspiration or negative example. Both types were equally needed for my academic development and for becoming the person who I am today.

Lastly, on the most personal level humanly possible, I thank my family and the few special friends I have. They provided me with their unconditional love, which one cannot take for granted. Without their love, this work would have not been possible.

Words cannot possibly express how much this means to me.

4

TABLE OF CONTENTS page

ACKNOWLEDGMENTS ...... 4

LIST OF TABLES ...... 7

LIST OF FIGURES ...... 8

LIST OF ABBREVIATIONS ...... 11

ABSTRACT ...... 12

CHAPTER

1 INTRODUCTION ...... 14

Objectives ...... 17 Dissertation Outline ...... 18

2 THE EMERGENCE OF NEW MAPPING PLATFORMS: THE CASE OF MAPILLARY ...... 21

Related Work ...... 23 Study Setup ...... 25 Overview of the Analysis ...... 25 Tile System ...... 26 Data Description and Extraction ...... 27 Mapillary...... 27 Street View ...... 29 Reference data for determining completeness ...... 29 Completeness Computation for Mapillary and Street View...... 30 Results ...... 32 Contribution Patterns of Mapillary Data ...... 32 Country and continent level...... 32 Individual level ...... 33 Completeness of Street Level Photos ...... 37 Completeness of Mapillary data at the country level ...... 37 Completeness of Mapillary and Street View data at the city level ...... 38 Summary ...... 40

3 CROSS-LINKAGE BETWEEN STREET LEVEL MAPILLARY PHOTOS AND OSM EDITS ...... 51

Literature Review ...... 52 Study Setup ...... 56 Extraction of Mapillary Related OSM Events ...... 56 Extraction of Mapillary and OSM Features from Across-Platform Users ...... 57

5

Results ...... 59 Contribution Patterns for Cross- OSM Features ...... 59 Cross-linkage between OSM event types and Mapillary ...... 59 Cross-linkage for OSM primary features ...... 61 OSM activity types associated with Mapillary ...... 63 Across-Platform User Contributions ...... 64 Summary ...... 65

4 HOW DO VOLUNTEER MAPPERS USE STREET LEVEL MAPILLARY PHOTOS TO ENRICH OSM? ...... 73

Materials and Methods...... 74 Data Sources ...... 74 OpenSteetMap ...... 74 Mapillary...... 75 Data Preparation and Processing ...... 75 Workflow ...... 75 Extraction of editing sessions ...... 76 Extraction of candidate features ...... 76 Extraction of edits likely based on Mapillary...... 77 Results ...... 78 Summary ...... 80

5 OSM DATA IMPORT AS AN OUTREACH TOOL TO TRIGGER COMMUNITY GROWTH? A CASE STUDY IN MIAMI ...... 85

Materials and Methods...... 90 Miami-Dade Large Building Import ...... 90 Outreach Techniques and Target Audiences ...... 93 Students ...... 94 Existing OSM community and local community ...... 94 Results ...... 97 Participation Numbers ...... 97 Students ...... 97 Community users ...... 98 New and existing users ...... 101 Mapping Behavior ...... 102 Temporal aspects ...... 102 Spatial aspects ...... 106 Other OSM activities ...... 108 Summary ...... 110

6 CONCLUSIONS ...... 120

LIST OF REFERENCES ...... 126

BIOGRAPHICAL SKETCH ...... 136

6

LIST OF TABLES

Table page

2-1 Most complete administrative units in terms of Mapillary completeness ...... 50

2-2 Completeness of Mapillary (Map) and Street View (SV) together with relative completeness difference (Rel. Diff.) ...... 50

3-1 Weekly aggregated number of OSM users ...... 71

3-2 Identified OSM events with reference to Mapillary by continent ...... 71

3-3 Distribution of primary OSM feature categories cross-linked to Mapillary ...... 72

3-4 Number of OSM features based on activity type ...... 72

5-1 Descriptive statistics of imported buildings by user group ...... 119

5-2 Description of events related to the import project ...... 119

7

LIST OF FIGURES

Figure page

2-1 Mapillary image and mapped GPS trace as seen on the website as of 2016 ..... 43

2-2 Geographic units (tiles) in the WMTS scheme illustrating tile coordinates (X, Y) and zoom levels ...... 43

2-3 Removing segments longer than 1km (dashed lines) from the Mapillary dataset and retaining only those that represent real coverage (green lines) ...... 44

2-4 Original Street View tiles at zoom level 13 lacking details of the road network (a), and a regenerated tile allowing to distinguish between individual roads (b) ...... 44

2-5 Keeping most information from tiles intersecting with country borders by using a refined scale. Within intersecting tiles (red outline), only a few finer resolution tiles with a spatial resolution of ~40m were excluded from the completeness analysis ...... 45

2-6 Number of users per continent contributing to Mapillary ...... 45

2-7 Cumulative data growth of Mapillary. Total distance mapped per continent (a) and country (b), and average distance mapped per user in continents (c) and countries (d)...... 46

2-8 Mapillary sequences in Australia/Oceania (a) and in the Iberian Peninsula (b) by home country of users ...... 46

2-9 Power law approximations of contribution patterns for individual users (a-d) and contribution span of users (e) ...... 47

2-10 Convex hulls of contributions from individual users revealing important travel corridors for Mapillary users ...... 48

2-11 Example contributions of different mapper types: Casual mapper recording a single hiking trail (a) and map lover systematically recording a neighborhood (b) ...... 48

2-12 Completeness of street level Mapillary images on main roads ...... 49

2-13 Spatial distribution of relative completeness difference between Mapillary and Google Street View for selected cities ...... 49

3-1 Histogram of filtered changeset areas (a) and accurate spatial representation of changesets after removing changesets with large areas in Europe colored by user (b) ...... 68

8

3-2 Number of weekly OSM events cross-tagged to Mapillary ...... 68

3-3 Spatial distribution of identified OSM events with reference to Mapillary ...... 69

3-4 Using street level imagery in OSM: Mapping runway features (a), indoor mapping (b), deriving descriptive information (c), and deriving new road pattern (d) ...... 69

3-5 Ratio of mapped areas in Mapillary and in OSM (a), and spatial distribution of a user’s contributions (b) ...... 70

3-6 Distribution of users by the size of area they mapped in OSM, Mapillary or both ...... 70

4-1 Editing sessions (blue hatched polygons) aggregated from individual viewing extents (yellow rectangles) ...... 82

4-2 Retained OSM edit based on Mapillary (yellow), excluded candidate features (red), and location of Mapillary photos (green dots). Example #1 was excluded because of the lack of Mapillary photos in proximity, while example #2 was excluded based on temporal constraints ...... 82

4-3 Number of OSM-Mapillary editing sessions per week, grouped by editor software (a), and number of unique OSM feature edits based on Mapillary per week (b)...... 83

4-4 Weekly aggregated number of OSM users using Mapillary to improve OSM ..... 83

4-5 Histogram of sign up dates of OSM users engaging in photo-mapping. The peak in the right implies that newly registered mappers also engage in photo- mapping ...... 84

5-1 Import buildings (automatically uploaded – green, for manual review – red) in Miami-Dade County (a), excerpt from the import tutorial (b), and user interface of the Tasking Manager instance (c) ...... 114

5-2 Spatial distribution of OSM changesets (transparent cyan rectangles) between March 2016 and August 2016 in Miami-Dade County (red outline) in South Florida (a), and a Maptime Miami held on September 26, 2016 (b) ...... 114

5-3 Histograms of sign up dates for different user groups: students (a), top mappers (b), and community mappers (c). For students, assignment due dates (solid vertical lines) and first introduction to the project (dashed vertical line) are shown ...... 115

9

5-4 Import activity over time for different user groups. Dashed vertical lines show community events, while solid vertical lines represent assignment deadlines for students ...... 115

5-5 Number of times each individual tasks from the Tasking Manager were accessed revealing that downtown areas and larger tasks are being accessed more frequently ...... 116

5-6 Finished tasks (green), tasks that have been worked on (red), and tasks that have not been worked on (blue) ...... 117

5-7 Fitted power law functions on task lock (a) and user (b) distributions (log-log plots) ...... 117

5-8 Spatial distribution of OSM changesets submitted by participating users between May 2016 and October 2017 ...... 118

10

LIST OF ABBREVIATIONS

AGILE Association of Geoinformation Laboratories in Europe

API Application Programming Interface

CC0 Creative Commons Zero License

CC-BY-SA Creative Commons Attribution ShareAlike License

ESRI Environmental Systems Research Institute, Inc.

GIScience Geographic Information Science

GPS Global Positioning System

HOT Humanitarian OpenStreetMap Team

JOSM Java OpenStreetMap Editor

JSON JavaScript Object Notation

ODbL Open Database License

OSM OpenStreetMap

QA Quality Assurance

TB Terabyte

TM Tasking Manager

URL Uniform Resource Locator

VGI Volunteered Geographic Informatiom

WGS84 World Geodetic Distance 1984

WMTS Web Map Tile Syste

11

Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

INVESTIGATING USER CONTRIBUTION PATTERNS TO NOVEL VOLUNTEERED GEOGRAPHIC INFORMATION PLATFORMS

By

Levente Juhász

December 2018

Chair: Hartwig H. Hochmair Major: Forest Resources and Conservation

Volunteered Geographic Information (VGI) refers to spatial data generated by ordinary citizens through their online activities. The era of Web 2.0 experienced a rapid growth both in the number of available VGI platforms and in the number of users who engage in such activities. VGI is already proven to be useful for a number of applications, ranging from the creation of traditional map data to extracting traffic signs from voluntarily shared photographs. The geographic information science (GIScience) community also started to eccessively utilize VGI to answer a variety of research questions, for example to better understand human mobility through geotagged messages, or to provide ground information for first responders during natural or man-made crises. However, as VGI is often generated by non-professionals, or for reasons other than scientific research, its quality can be questionable. Naturally, this dissertation has a community focus since people who generate VGI are key to understand the characteristics of data they may provide. Therefore, this work carries out a set of investigations to illustrate recent trends in contribution behavior, especially on

12

new data platforms, which in turn will help understand data quality issues associated with VGI.

First, the emergence of new platforms is illustrated by analyzing and characterizing user contributions to Mapillary, a street level photograph application, both on the country and on the individual level. A completeness assessment and comparison to a commercial platform is also given to illustrate how effective contributors are in reaching the goals of a new mapping platform. In the next chapters, using Mapillary and

OpenStreetMap as examples, the data interplay is discussed between these two platforms. The cross-fertilization of data between VGI platforms was only observed in recent years. This research describes how user generated data from one platform is being used to improve another, and analyzes the spatio-temporal characteristics of this behavior using various techniques.

Lastly, the final part of the dissertation conducts an experiment with VGI users and assesses the effectiveness of outreach techniques in triggering community growth.

In addition, the behavior of different VGI user groups is also described with special emphasis on the motivation of users.

13

CHAPTER 1 INTRODUCTION

In recent years, online services and applications have become part of our everyday lives. For example, we use navigation apps to plan our routes to places we are unfamiliar with, photo sharing services to share vacation memories, and social media services to keep connected with people we know. During these activities, people inevitably generate massive amounts of data that can be analyzed and used to answer a number of questions. This is the so-called user generated content (UGC) that has been extensively studied in recent years. A large portion of this user generated data contains a geographic component, and is often referred to as Volunteered Geographic

Information (VGI). However, VGI terminology is not standardized, and the term is often used as an umbrella for user generated geographic data. For example, VGI may be created explicitly for the purpose of producing geographic data (e.g. collectively edited maps), or be generated involuntarily by online users (e.g. geocoded social media posts)

(See et al. 2016).

A lot of research attention was put into addressing different aspects of data quality since the term VGI was first coined in 2007 (Goodchild 2007b). It was also not long until the GIScience community realized that understanding the behavior and motivation of users plays a major role in understanding data quality. Therefore, the research presented in this dissertation is related to the topic of spatial data quality and user participation of VGI as discussed by Elwood, Goodchild, and Sui (2012) and

Haklay (2013). As mentioned above, different types of VGI exists and the distinction between them should not be overlooked when utilizing VGI for research since the different nature of data sources defines the characteristics of the underlying data. For

14

example, voluntarily edited maps (e.g. OpenStreetMap) or street level photographs (e.g.

Mapillary) can be seen as voluntarily made contributions as users work towards the common goal of generating geographic information. A very different approach to this is posting geocoded messages on social media (e.g. on ) where the user may not know that his or her messages are tied to a geographic location. Nevertheless, these involuntarily generated messages can still be mined from online services and analyzed later. Keeping this distinction in mind, with VGI platform this work refers to a Web 2.0 application that explicitly asks for geo-coded data contributions (e.g. roads in

OpenStreetMap or street-level photographs in Mapillary).

Both explicit VGI and other user generated geographic data sources can be massive in volume, are heterogenous, and they lack traditional quality assurance (QA) techniques (Goodchild and Li 2012). Instead, these platforms mostly use non-traditional techniques for QA that are based on their unique characteristics, such as dynamics and crowd behavior (Rice et al. 2012). A general view is that the representation of different places in VGI sources are uneven (Graham and Zook 2013). For example, rural areas will often contain fewer contributions (e.g. photos, map features) in most VGI datasets than urban areas, and VGI data quality varies between world regions (Hecht and

Stephens 2014). A basic idea to evaluate VGI data quality is to compare these datasets to a reference dataset. Due to the limitations originating from the uneven distribution of geographic information found on the internet, completeness analysis is preferably conducted on local scales where data quality can be considered consistent (Haklay et al. 2010). Alternatively, intrinsic quality analysis methods are also frequently used if reference data is not available (Barron, Neis, and Zipf 2014; Degrossi et al. 2018).

15

UGC is usually considered to be of high quality when it is created and reviewed by many users (Elwood, Goodchild, and Sui 2012). Localness and the ‘wisdom of crowds’ also play a significant role in the credibility of VGI (Flanagin and Metzger 2008).

In computer science, Linus’ Law – named after Linus Torvalds, the creator of Linux, an open source advocate -, refers to the open source movements and states that “given enough eyes, all bugs are shallow”. The same concept was found to be applicable to

OpenStreetMap (Haklay et al. 2010). Even though the relationship between the number of contributors and data quality is not linear, it is suggested that the law might be valid to

VGI in general. Others argue that the law is less efficient for geographic facts by giving an example of a long standing false information in Wikimapia which was supposed to be corrected by the approach of ‘many eyes’ (Goodchild and Li 2012).

Understanding contribution patterns is a major challenge, and it is important in the context of spatial data quality of VGI (Budhathoki and Haythornthwaite 2013).

Uneven access to internet technologies throughout the world also prevents certain groups of people from participating in VGI creation, which is known as the digital divide

(Goodchild 2007a; Heipke 2010). Contribution inequality (Nielsen 2006) refers to the fact that the majority of activities in online communities can be attributed to only a small fraction of the user base. This is often referred to as the 90-9-1 rule, which can be used as a rule of thumb to describe users in collaborative projects, such as wikis. According to this rule, 90% of the community only view content and do not contribute data

(lurkers), 9% edit and improve already existing content (contributors), while only 1% of the whole user base engages in the creation of new content (creator). The same contribution inequality can be observed in geospatial collaborative projects as well. It

16

was already found that like with wiki projects, only a small fraction of registered users tend to actively contribute VGI (Neis and Zielstra 2014), therefore, identifying contributor inequality in VGI can help distinguish between users and contributors. In the context of this dissertation, this effect is estimated by power law relationships between the number of contributors and aggregated variables (e.g. number of days of actively spent on mapping). Power law functions have long tails corresponding to the active, committed user base (1%). At the other end of the spectrum are the majority of users, who are greater in numbers (90%) but tend to generate little to no activity. The dissertation presents several instances where an observed distribution can be well approximated with power laws, which implies contributor inequality in the examined datasets.

However, power laws have to be interpreted carefully because of the extreme values present in the data (i.e. few people responsible for the majority of the activity, and lot of people generating little activity). For this reason, median values are preferred over means to describe a population that shows power law characteristics.

Objectives

The main objective of this dissertation is to advance our understanding of VGI contributor behavior, and to extend the literatue by describing a previously unexplored phenomenon of cross-platform user behavior. To reach this goal, VGI data from

OpenStreetMap and Mapillary is retrieved and analyzed, since in addition to the usefulness of these separate datasets, they provide a unique opportunity to study the interplay of the two VGI communities and data. A variety of approaches will be considered during the research process. The main goal of the research is accomplished by pursuing a set of distinct objectives that can be summarized as follows:

17

 Analyze the spatio-temporal user contribution patterns to a new VGI platform (Mapillary) during its early stages

 Determine the completeness of Mapillary images on different road categories, and also in relation to a commercial imagery provider, Google Street View

 Determine to what extent Mapillary imagery is used as a source for OSM mapping by the community

 Analyze and compare the spatio-temporal contribution patterns of different VGI user groups to OSM during a controlled experiment and assess motivational factors behind these contributions

Dissertation Outline

Chapter 1 gives a general introduction to the conducted research and places it in context. The general objective of this dissertation is also explained here. Chapters 2 and 5 are self-contained journal articles. Chapter 3 is a peer-reviewed conference full paper and Chapter 4 is a peer-reviewed conference short paper. Chapter 2 was published in Transaction in GIS (Juhász and Hochmair 2016b). Chapter 3 was presented at the annual AGILE (Association of Geoinformatics Laboratories in Europe) conference in 2016 in Helsinki, Finland, published in the Lecture Notes in

Geoinformation and Cartography series, and was also nominated for the Best Full

Paper Award (Juhász and Hochmair 2016a). Chapter 4 was presented at the AGILE

2017 Conference in Wageningen, The Netherlands and published in the conference proceedings (Juhász and Hochmair 2017a). Chapter 5 was published in the ISPRS

International Journal of Geo-Information (Juhász and Hochmair 2018c).

The first objective of the dissertation focuses on the ever-changing landscape of

VGI platforms. In Chapter 2, Mapillary data is introduced to the GIScience community with special emphasis on analyzing the spatio-temporal user contribution patterns. This chapter provides a better understanding of user contribution patterns in the early stages

18

of a new platform. The second objective, namely to determine the completeness of

Mapillary is also addressed in Chapter 2. The analysis calculates completeness values on both local and global scales on reference roads (i.e. main, residential and off-road pedestrian and cycle paths) to reveal what percent of the road network is covered by voluntarily contributed street level photos, and also compares completeness to that of a commercial provider, namely Google Street View.

The third objective is addressed in Chapter 3. This chapter analyzes the interplay of two VGI data sources. More specifically, it looks at how the OpenStreetMap community uses Mapillary street-level imagery to improve OSM. The analysis identifies what information is being derived from voluntarily collected images and inserted in

OSM, and also areas where this observed phenomenon is taking place. In addition, it also assesses the interplay of the two user communities. More speficially, it identifies users who contribute to both platforms and analyzes the co-occurrence of their contributions. The analysis presented in Chapter 3 relies on users when identifying what

OSM changes are based on Mapillary images, which can be incomplete. For this reason, in Chapter 4 another method is presented that is able to identify map edits based on Mapillary images even when the source is not indicated by users. This chapter therefore extends the analysis in Chapter 3 by capturing cross-platform edits that were previously not identified. As opposed to “cross-tagging” described in Chapter

3, this phenomenon is referred to as “cross-viewing”.

The last objective of the dissertation is to compare the spatial and temporal contribution patterns of distinct VGI user groups, and it is discussed in Chapter 5. For this goal, a data import task to OSM was implemented and different user groups (such

19

as students, community members and new recruitees) were asked to join the project.

This chapter analyzes their spatio-temporal contribution patterns and compares data volume, editing intensity over time to reveal whether different outreach techniques have an effect of the user group behavior. Motivational factors behind these contributions are also discussed in this chapter.

Finally, Chapter 6 concludes the work, summarizes the findings of this dissertation, and provides directions for future work.

20

CHAPTER 2 THE EMERGENCE OF NEW MAPPING PLATFORMS: THE CASE OF MAPILLARY

Geolocated street level photographs are an important data source for a variety of transportation analysis tasks, including the identification of road features, such as traffic signs (Gonzalez, Bergasa, and Yebes 2014), the evaluation of wheelchair accessibility of sidewalks (Hara et al. 2014), or the deployment of navigational tools for visually impaired citizens (Guy and Truong 2012). Mapillary is the first platform to provide detailed street photos based on crowdsourcing, adding to the list of Web 2.0 applications that administer and facilitate the collection of Volunteered Geographic

Information (VGI) (Goodchild 2007b). Mapillary started its public service in early 2014 and is run by a company located in Malmö, Sweden, with a satellite office in Los

Angeles, California. Imagery is provided at https://www.mapillary.com under the

Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) License.

This means that everyone is free to use the data, even for commercial purposes.

Mapillary, like other VGI data sources, can be seen in different contexts (Haklay

2013). Its foremost aim is to produce geographic information, which necessitates consideration of spatial data quality (Goodchild 2007b). With data completeness being one of the major data quality elements, data collection and data quality are closely intertwined in Mapillary. In this context, this study analyzes data growth and data completeness on a worldwide scale as well as for selected local areas. In the context of user participation (Elwood, Goodchild, and Sui 2012) it will assess user loyalty in data contribution. We assume that the Mapillary’s ShareAlike license attracts mostly contributors who support the idea of open data over proprietary data, even if proprietary data, or part of them, are provided free of charge, as is the case with Google Street

21

View data (from here on simply called Street View). The fact that Mapillary imagery has already been collected in six continents of the world (Australia and Oceania, North

America, South America, Europe, Asia, and Africa) renders Mapillary data a useful supplemental data source to commercial products of street level photographs, such as

Street View. This study will therefore for selected urban and rural areas also compare data completeness between Mapillary and Street View relative to reference road datasets along different road categories. Street View provides street level photographs as 360º panorama images on selected roads from all over the world. The images are primarily taken from cars that are equipped with professional cameras. However, for selected bicycle and hiking paths Street View images are captured through specifically developed camera backpacks and tricycles with camera mounts.

Mapillary data contributors can upload photos using a smartphone application, or manually on the Mapillary website. A video upload functionality has been added in April

2015, which automatically extracts geotagged photos from the video stream. Videos need to be accompanied by a GPS log for geocoding purposes. Since Mapillary facilitates data caption from any GPS enabled mobile device equipped with a camera it increases the range of potential data contributors compared to professional camera teams employed by Google for Street View data collection. This flexibility results in

Mapillary imagery also being frequency collected from off-road paths, such as hiking trails.

Figure 2-1 shows how a captured street level photograph can be viewed on the

Mapillary website. The blue lines on the map indicate all GPS sequences within the area along which photos were taken, and the green marker with the arrow along the red

22

GPS sequence shows the image location together with the orientation of the camera.

For devices without a built-in compass, the camera orientation is determined by the travel trajectory, using the current location and the location of the next photo of the sequence. The orientation can be manually edited later.

Related Work

The availability of street level photographs can eliminate time and resource consuming in-person fieldwork. Identification of urban neighborhood features is commonly used in health research since the characteristics of the built environment influence health behavior. Clarke et al. (2010) gathered neighborhood measures (e.g. park, playground, graffiti presence) from Street View imagery to characterize communities. Rundle et al. (2011) and Griew et al. (2013) found that Street View can be used to audit neighborhood environments by matching results obtained from Street

View imagery data to data collected on-site. However, on-site assessments should be conducted when characterizing micro-environments for special activities, such as cycling (Vanwolleghem et al. 2014). Several studies applied computer vision to Street

View imagery, for example, to identify sidewalk curbs (Hara et al. 2014), or to determine the geographic location of photographs based on reference imagery extracted from

Street View (Zamir and Shah 2010). Yin et al. (2015) provide technical details on downloading and assembling Street View images and their use for pedestrian detection using machine vision and learning technology. Guy and Truong (2012) developed a navigation prototype that relies on the manual identification of road intersection features from Street View images to augment visually impaired pedestrians’ sensory information with a richer depiction of the environment. Despite the wide use of Street View imagery in geospatial applications and research, the spatial coverage of Street View, i.e., data

23

completeness and geographic extent, has so far not been discussed in the literature. An earlier version of the study presented in this paper revealed that Street View provides fairly complete coverage where service is offered in a city (Juhász and Hochmair 2015).

For the 20 largest German cities, Street View covers 77% - 92% of the main roads, 58%

- 84% of the residential roads, but only less than 4% of off-road pedestrian and bicycle network segments. No statistical relationship could be identified between the presence of Street View data and the Mapillary completeness value for the analyzed 40 German cities. This implies that data collection of Mapillary is driven by factors other than the prior existence of proprietary data, such as contributors’ idealistic attitude to contribute collected data even in areas where proprietary data are provided free of charge. A related study compared the data quality of mapped roads between Google Maps (not

Street View, though), Bing Maps and OpenStreetMap (OSM) in Ireland where spatial coverage, currency and ground-truth positional accuracy were measured. The authors found that the performance of the three datasets varied between the five analyzed cities and that there was no clear best dataset (Cipeluch et al. 2010).

Part of the completeness analysis conducted in this paper relies on reference data, where OSM is used as one of the data sources. Although the data quality of OSM is not assured by standards or an authoritative agency, several studies report that OSM provides high road coverage especially in urban areas, rendering it a viable free alternative to commercial datasets (Girres and Touya 2010; Haklay 2010; Zielstra and

Zipf 2010). Other research studies assessed OSM data completeness on off-road paths, e.g. pedestrian segments (Zielstra and Hochmair 2011) and bicycle trails

(Hochmair, Zielstra, and Neis 2015). Results of these studies indicate that OSM

24

provides a robust reference dataset for determining Mapillary and Street View data completeness even for off-road data analysis.

Several aspects of data analysis applied on the Mapillary dataset presented in this paper have been previously applied on other VGI data sets as well as non-spatial user generated content. For example, participation inequality is well-known in online social networks and in large-scale, multi-user communities (Nielsen 2006). It means that most of the data is contributed only by a small fraction of users. Participation inequality could be observed for Wikipedia (Javanmardi et al. 2009), OSM (Neis and Zipf 2012), dronestagram (Hochmair and Zielstra 2015), and Flickr and Twitter (Li, Goodchild, and

Xu 2013). Techniques to identify a contributor’s home region from data contribution patterns or social interactions with other users have been developed for OSM (Neis and

Zipf 2012; Zielstra et al. 2014), Twitter (Davis Jr et al. 2011), and (Backstrom,

Sun, and Marlow 2010). The extent of human mobility patterns, measured as the radius of gyration, have been derived from check-ins (Cheng et al. 2011), tweets

(Hawelka et al. 2014) and mobile phone records (González, Hidalgo, and Barabási

2008). The localness of user generated content, i.e. the share of content that is contributed from within 100 km of a contributor’s specified home location, was found to be higher in Flickr than in Wikipedia (Hecht and Gergle 2010).

Study Setup

Overview of the Analysis

The study analyzes Mapillary data that were collected within the first year since the inception of the Mapillary project. The first part of the analysis focuses on measuring the aggregated data volume over time, both for countries and continents, as well as determining contribution patterns of individual users. The second part of the analysis

25

examines Mapillary data completeness relative to reference road datasets from

OpenStreetMap and ESRI, both at the country and city level. It compares also Mapillary to Street View completeness for selected cities in the United States and Europe.

Tile System

Whereas measuring Mapillary data contributions for the first part of the analysis relies on vector data operations, the second part of the analysis, which addresses data completeness, requires overlaying road data from several sources, i.e., Mapillary,

Google, OSM, and ESRI. For this purpose we used raster tiles of the Web Map Tile

Service (WMTS) (Masó, Pomakis, and Julià 2010) as a common spatial reference system. This so called XYZ tile scheme, which was first used by Google , became a de facto standard in Web mapping and is used by numerous other map providers as well, including Bing Maps, Yahoo Maps, OSM and Mapbox. Tiles are provided as 256 x 256 pixel images. For the visualization of geographic data, the world is divided into tiles corresponding to zoom levels. In each zoom level, tiles are indexed by X (column) and

Y (row) values starting from the top-left corner (Figure 2-2). Zoom level 0 covers the whole world in one tile. The total number of tiles at a zoom level z is 22z. Logically, the system can be considered to be a hierarchy of folders and files, where each zoom level is a folder, each X coordinate is a subfolder and each Y coordinate is a raster image file.

Geographic coordinates can be converted to tile coordinates to determine in which tile a location on the Earth’s surface can be found using the following equation:

푙표푛 + 180 푋 = ⌊ ∙ 2z⌋ 360

1 1 (2-1) 푌 = ⌊(1 − ∙ ln (tan (푙푎푡 ∙ 휋 ) + ))⌋ ∙ 22푧 휋 180 휋 cos(푙푎푡 ∙ 180)

26

where X and Y are tile coordinates, z is the zoom level and lat, lon are geographic coordinates. Tile coordinates can also be converted to geographic coordinates indicating the top-left corner of a tile as follows:

푋 푙표푛 = ∙ 360 − 180 2z 푌 180 (2-2) 푙푎푡 = arctan (sinh (휋 − ∙ 2휋)) ∙ 2z 휋

In this research the XYZ tile scheme at zoom level 13 is used as a reference grid system for comparison and completeness analyses. At this zoom level the ground pixel size corresponds to approximately 19 m and tiles cover an area of approximately 4.9 km x 4.9 km. This setting conceals GPS positioning errors occurring during data collection of Mapillary, and avoids double counting of sequences taken along the same road. As a preliminary step, all data used in the completeness calculation were converted into this schema to make them comparable.

Data Description and Extraction

Mapillary

Mapillary provides access to its data in several ways. The website visualizes the position of individual photos (as points) and sequences of photos (as lines). It allows viewing and downloading individual images associated with these points and lines

(compare Figure 2-1). It can also filter uploaded image content by date and season and is able to show traffic signs that were extracted through image processing.

Mapillary provides also a JSON API which allows a user to search for images and sequences. Data can be downloaded in JSON and GeoJSON formats. These objects contain information about the photos and sequences, as well as the URL of corresponding photos. Mapillary data can be embedded in other applications and web

27

mapping frameworks. For example, Mapillary photos and sequences have been included in the official OpenStreetMap iD editor since October 2014, making it easier for voluntary mappers to use visual Mapillary information for mapping purposes.

For this study, image sequences (as opposed to individual photos) were chosen for analysis since they provide a suitable representation of network segments that were mapped with Mapillary data. A sequence is a list of GPS nodes where each node represents the location of an image. Although sequences can be downloaded from the

API, the Mapillary developer team provided us with a database dump from February 3,

2015 that contained additional data information, such as a unique user ID for each sequence, timestamps showing when sequences were taken and uploaded to the site, and the geometry as LineStrings. This data was stored in a PostgreSQL database with the PostGIS extension enabled.

To avoid blurry images, the Mapillary smartphone application prevents the device from taking a photo if the phone is shaking. The same occurs during a loss of the GPS signal. In both cases, a sequence is automatically closed, and a new one started when an image is taken again after the interruption. A sequence is also interrupted if the distance between the current and the previous photo exceeds a certain threshold.

Earlier versions of the application did not close a sequence in any of these cases, which led to long straight segments in the Mapillary dataset. Hence, for data analysis, straight segments of 1 km length or more were first removed, as shown by dashed lines in

Figure 2-3. This ensured that sequences represented the true coverage of the taken imagery. For the completeness analysis, raster tiles were generated at zoom level 13 with Mapnik.

28

Google Street View

To be able to compare Street View to Mapillary coverage and to calculate Street

View completeness, raster tiles of zoom level 13 are needed. Street View coverage tiles are highly generalized by default at zoom level 13 (Figure 2-4a) and can therefore not be used for comparison with other road networks. To make Street View tiles comparable to Mapillary tiles, Street View tiles were regenerated (Figure 2-4b). To do so, a vector version of Street View lines was extracted as follows. A self-developed client-side script downloaded Street View coverage tiles at zoom level 17 for our areas of interest in PNG format into the Web browser’s cache. Individual images were then extracted from the cache and stored within the XYZ tile scheme. The ground pixel resolution at this level is 1.2 m, which provides a detailed representation of Street View coverage. After geocoding each tile using Equation 1b, PNG tiles were loaded into a

GRASS GIS workspace where they were vectorized and uploaded into a PostgreSQL table with line geometries. The workflow of extracting line features used the algorithms of r.patch, r.thin, r.to.vect and v.generalize. Finally, zoom level 13 raster tiles were rendered with Mapnik to match Mapillary tiles.

Reference data for determining completeness

To calculate completeness of Mapillary and Street View data, reference datasets are needed. This study evaluates completeness at local and global scales. For the worldwide assessment of Mapillary data completeness, we used the ESRI World Roads dataset, which is based on the DeLorme World Base Map. It can be assumed that this data product applies the same quality standard for the whole globe and that it therefore provides a more consistent reference dataset than OSM, which varies by region (Neis,

Zielstra, and Zipf 2013). The worldwide completeness analysis was conducted on main

29

roads only, and therefore ferries and local roads were excluded from the reference dataset before the analysis.

A more detailed completeness analysis in selected areas in the United States and Europe was conducted both for Mapillary and Street View. For this purpose an

OSM database dump from February 2015 was downloaded from Geofabrik

(http://download.geofabrik.de), which provides a comparable data quality for all selected areas. All roads were extracted with the Osmosis software tool using a highway=* filter and uploaded into a PostgreSQL database. Additional queries were designed to extract the following road categories:

 Main roads: connect settlements and cities

 Residential roads: minor, lower level roads with moderate traffic

 Pedestrian/Cycle paths: minor elements of the road network used by pedestrians and cyclists for daily routine or recreational purposes

Inaccessible roads, sidewalks, road crossings, tunnels and indoor features were excluded from the selection based on their tags. Visual inspection of the results showed a high number of pedestrian/cycle features close to higher road categories. We eliminated those pedestrian/cycle features that were within 25 m from the other two categories to analyze only off-road pedestrian/cycle features. Map tiles for zoom level

13 were generated from ESRI and OSM road datasets for the different geographic regions (global or local, respectively) and were subsequently used as reference datasets for the completeness analysis.

Completeness Computation for Mapillary and Street View

For the completeness analysis at the local scale we aimed to determine the percentage of reference roads, separated by road category, which were also mapped in

30

Mapillary and Street View, respectively. To compute completeness, a self-developed python script compared the content of tiles of reference datasets with that of Mapillary and Street View tiles. More specifically, the script loaded each reference tile, identified pixels of reference roads and also counted the number of pixels overlapping with pixels in the Mapillary and Street View tiles. Results were loaded into a PostgreSQL database with polygon geometries.

The geographic extent of a tile at zoom level 13 is ~4.9 km x 4.9 km. For the worldwide completeness analysis of Mapillary data it was necessary to assign a mapped road segment to the correct country, which needed special data handling if a border between two countries was running across a tile. Affected tiles were subsequently divided into a refined grid system with a tile size of approximately 40 m x

40 m. Pixel count values of the original tiles (green polygons in Figure 2-5) and the refined tiles, considering only those that are completely contained within a country

(yellow polygons in Figure 2-5), were assigned to the corresponding country.

Completeness can then be calculated for administrative units using reference data as follows:

∑ 푃푥 퐶 = ∑ 푅푒푓 (2-3)

Where C is the completeness of Mapillary or Street View; Px is a Mapillary or Street

View pixel from the reference road dataset, and Ref is a pixel from the reference road dataset.

In addition, the relative completeness difference was calculated for selected tiles as part of the local analysis to compare Mapillary completeness with that of Street View on specific road categories as follows:

31

0, 푖푓 푆푉푟 + 푀푎푝푟 = 0 푑(푆푉푟, 푀푎푝푟) = {(푆푉푟 − 푀푎푝푟) , 푖푓 푆푉푟 + 푀푎푝푟 > 0 (2-4) (푆푉푟 + 푀푎푝푟) where d is the relative completeness difference, SVr is the count of Street View pixels and Mapr is the count of Mapillary pixels on road category r. Relative completeness difference values range between -1 and 1. A positive value means that a tile contains more roads mapped with Street View than with Mapillary, and a negative value means the opposite. A value of zero can either mean that Street View and Mapillary have identical coverage, or that they have no coverage at all. Pixels were in the latter case excluded from the corresponding map in Figure 2-13.

Results

Contribution Patterns of Mapillary Data

Country and continent level

Figure 2-6 shows for each month the number of users actively contributing to

Mapillary, separated by continent (a-f) and worldwide (g). Each bar chart indicates the proportion of new and returning (those, who mapped before) users. This split reveals that the majority of mappers contributes on a regular basis. Up to February 3, 2015, data were contributed from 1709 users worldwide (Figure 2-6g). The number of users is largest in Europe (1194 users, Figure 2-6a), followed by North America (303 users,

Figure 2-6b). In these two continents, Mapillary users are more committed to Mapillary than in other continents, as can be seen by the higher portion of returning users each month.

Mappers covered GPS tracks of more than 209,000 km worldwide, where it should be noted that this number counts also overlapping sequences that were taken on the same road. The growth pattern for the total mapped distance is similar to that of

32

user numbers. Again, Europe, followed by North America, are the most mapped continents (Figure 2-7a). Figure 2-7b shows the evolution of contributed data for the 5 most mapped countries. Mapillary data collection focused primarily on Germany,

Sweden and Poland in Europe and on the United States and Canada in North America.

In addition, Figure 2-7 shows the evolution of the average distance mapped per user for continents (Figure 2-7c) and the same countries as before (Figure 2-7d). Interestingly, the average contribution per user is higher in 17 other countries than in the most mapped country, Germany. The countries with the highest average distance mapped per user value are Azerbaijan (561 km, 1 user), followed by Nicaragua (524 km, 6 users) and Thailand (451 km, 9 users). However, there are less than 10 users in 11 out of those 17 countries, which explains the small total distance mapped for some of the countries.

Mapillary data can be contributed by local users or visitors, e.g. when vacationing. To identify a user’s home country, we calculated a score based on a combination of days and distance mapped in each country. The marked peak for

Australia/Oceania in November 2014 for distance per user (Figure 2-7c) is caused by a user with home country Austria who mapped actively for one week in Australia (Figure

2-8a). Another example of extensive visitor mapping can be found in the Southwestern part of the Iberian Peninsula, where users from 8 different countries, most of them probably tourists, contributed data (Figure 2-8b).

Individual level

Different measures can be considered to describe how individual mappers contribute data to Mapillary. Days of active contributions is a measure of a user’s continued commitment to Mapillary data contribution. A related measure is the average

33

distance mapped per week. It is calculated by dividing the total length of sequences uploaded by the number of weeks that passed since the first contribution of a user.

The spread of locations mapped by a user can provide some insight into travel behavior. One basic measure is the number of countries mapped by an individual user.

A somewhat refined measure that uses coordinates of uploaded images and that was used in global mobility studies is the radius of gyration (González, Hidalgo, and

Barabási 2008; Hawelka et al. 2014). It measures how a user’s data contributions spread around the mass center. In other words, it is a measure of the user’s spatial extent in which he or she contributes to Mapillary. It is calculated as the root mean squared distance between each photo location and the center of mass:

푛 1 푟 = √ ∑|푝 − 푝̅|2 푛 푖 (2-5) 푖=1

where n is the number of pohoto locations under consideration, pi is the location of pohot i, and 푝̅ is the center of mass, calculated as the average of all photo locations.

Map projections preserve distances between two given points on a map only in specific situations but generally distort them. In consequence, trigonometric distance calculations based on plane geometry and projected coordinates can bias the radius of gyration (especially on a worldwide scale). Therefore, we calculate the distance between pi and 푝̅ as the minimum geodesic distance on the WGS84 ellipsoid.

For analyzing these variables, all users that registered at least 100 days before the end of the data collection (February 3, 2015) were selected. From this group, each user’s contribution within the first 100 days since the registration was analyzed. In this way, contribution patterns are not distorted by newly registered users who have not yet

34

had a chance to contribute for as many days as other users. Average weekly distance was analyzed for the first 14 weeks of contribution, i.e. 98 days.

Low radius of gyration values indicate local contributions, whereas high values indicate long-distance travel for mapping purposes. The radius of gyration is not limited to contributions of individual users but can be aggregated for several users, e.g. for those residing in a specific country or continent. Analyzing those countries that were the home country to more than 5 users showed a mean value of 180 km (std. dev. = 241 km). In these countries, the radius of gyration was largest for Argentina (~ 610 km), followed by Ireland (~ 470 km), Austria (~ 360 km), Norway (~ 350 km), and Canada (~

290 km). Whereas we hypothesized that country size, economic development, and island/mainland topology of a country would affect the size of the radius of gyration, multiple regression did not show any of these predictors to be significant.

The distributions of previously discussed contribution related variables can be well approximated by the power law function 푃(훿) = 훿exp (−훽), where 훿 is the contribution variable of interest and 훽 is the exponent. Log-log plots in Figure 2-9 relate the proportion of users to the average weekly mapped distance (Figure 2-9a, 훽 = 0.96), the number of active mapping days (Figure 2-9b, 훽 = 1.46), the radius of gyration

(Figure 2-9c, 훽 = 0.73), and the proportion of users contributing on a specific day (first, second, etc.) since their initial contribution (Figure 2-9d, 훽 =0.58). Most contributors made their first uploads quickly after creating the account (median: 1.4 days). A power law relationship can be identified for the number of countries contributors mapped in (훽

= 3.20, R-squared: 0.91), where the maximum number of countries mapped per user was 10 (not shown in Figure 2-9). These patterns reveal participation inequality. For

35

example, 1% of the users contributed data in over 50 days within their first 100 days of

Mapillary, compared to an average of 6.1 days. Further, 1% of the selected users have a radius of gyration greater than 3,500 km, compared to an average radius of gyration of 130 km.

The exponent value for the radius of gyration was found to be somewhat larger when derived from Foursquare check-ins (Cheng et al. 2011) (훽 = 0.99) or georeferenced tweets (Hawelka et al. 2014) (훽 = 1.25). One possible explanation for this difference is that Mapillary contributors have to travel larger distances once their home region is mapped, whereas tweets and check-ins can be continuously posted from the same local surroundings.

Figure 2-9e shows which percentage of mappers contributes how long (up to 24 weeks) after their initial contribution. For this plot only users that had their first contribution at least 24 weeks before the end of the data collection were considered.

The largest group contributes only up to three weeks after their initial contribution. The most committed group to the service (about 12%) is the one with a contribution span of

22-24 weeks and an average of 30 days of active mapping.

Figure 2-10 shows an overlay of convex hulls generated from each user’s

Mapillary data contributions. The map reveals main travel corridors between different geographic regions as part of the data collection process. The strongest ties can be identified between Europe and the United States, where 31 mappers contributed data in both continents.

In the data collection efforts of individual mappers, two mapping strategies can be observed, closely resembling two mapper types previously identified by Heipke

36

(2010). The first mapper type is a casual mapper, which is characterized as someone willing to spend only a low effort in mapping. Such a mapper would typically avoid extra trips just for the purpose of mapping but be more likely to contribute photos along paths that are primarily traveled for some other reason, e.g. recreation along a hiking track

(Figure 2-11a). The second type of mapper, the so called map lover, is someone who is used to working with correct maps throughout his/her professional life and who is more likely to collect data in a systematic manner. Figure 2-11b shows an example of a systematically mapped region where different colors indicate different Mapillary sequences uploaded by the same user.

Besides contributing new data users can also review photos and sequences uploaded by others as part of the quality improvement process. This includes correcting of orientation angles, editing of blurs and street signs, and hiding private photos or photos of poor quality. These changes need to be manually approved by Mapillary.

There exists even a group of users who focus more on data edits than on photo uploads. Although data edits are permitted, Mapillay does not provide a public editing history of images at this point. As opposed to this, in other VGI sources, such as OSM, the object editing history is frequently used for quality assessment of mapped features

(Neis and Zielstra 2014; Rehrl et al. 2013).

Completeness of Street Level Photos

Completeness of Mapillary data at the country level

Completeness is computed by comparing the amount of mapped road data in

Mapillary to a reference dataset. Figure 2-12 shows for each country the completeness of Mapillary on main roads, where ESRI main roads were used as the reference dataset. 16 out of the 20 most completely mapped countries are in Europe. However,

37

the highest completeness values can be found for three individual administrative units outside Europe (Table 2-1). These are Barbados, Hong Kong, and Nicaragua. Whereas

Barbados is a small country where mapping 11 km of main roads already leads to a completeness value of over 50% (first row), the high completeness rates in Hong Kong and Nicaragua can be clearly attributed to power users, where two mappers contributed

110 km (Hong Kong), and 6 mappers contributed 512 km (Nicaragua), respectively,

Completeness of Mapillary and Street View data at the city level

For 11 selected urban, mixed, and rural areas in the United States and northern

Europe we calculated the completeness of Mapillary and Street View as well as the relative completeness difference between Mapillary and Street View. All measures were computed separately for OSM main roads, residential roads, and off-road pedestrian/bicycle features (Table 2-2). At the city level Street View provides more complete imagery for all study areas on all road categories, except for off-road features in Malmö, Sweden. For the selected areas Street View imagery provides, on average, high coverage of main roads (95.8%), followed by fairly complete coverage of residential roads (78.5%) and much lower coverage of off-road segments (7.6%). The two study sites with rural characteristics show for all road types much lower completeness values than the mixed and urban neighborhoods, where completeness values of mixed areas lie between those of urban and rural neighborhoods. Hence, the

11 analyzed study regions suggest a decline in Street View coverage from urban to rural areas.

In Mapillary, similar completeness patterns can be observed. That is, main roads are most completely mapped (28.0%), followed by residential roads (3.8%) and off-road segments (2.3%). Completeness ranges between 10.1% (Los Angeles) and 58.0%

38

(Malmö) for main roads, between 1.3% (Fort Lauderdale, Minneapolis) and 20.4%

(Malmö) for residential roads, and between 0.0% (Fort Lauderdale) and 10.3% (Malmö) for off-road segments. Malmö, as the home town of the Mapillary project provides significantly better coverage than other cities. It more than doubles the completeness value for residential roads and off-road segments of the second best mapped city in the corresponding categories. This data abundance results in relative completeness difference values that are smaller than for other cities, meaning that Street View is less dominant than Mapillary in this city than in others. The relative completeness difference for off-road pedestrian/cycling features is even negative with a value of -0.14, showing that in Malmö more pedestrian/cycling features are mapped in Mapillary than in Street

View.Figure 2-13 visualizes for selected areas in Table 2-2 patterns of relative completeness differences for different road categories. Purple tiles without borders show areas with more roads mapped in Street View, whereas orange tiles with borders indicate areas with higher Mapillary coverage. For rural areas in Washington state

(Figure 2-13a, relative completeness difference on main roads: 0.22), most of the main roads were covered by both imagery (white tiles). Street View imagery shows a case where a single spot provides better Mapillary than Street View coverage. is available on more main roads than Mapillary (purple tiles), but some main roads are only covered in

Mapillary (orange tiles). In the second rural study area, for residential roads (Figure

2-13b, relative completeness difference on residential roads: 0.85) several road segments in the center are covered only by Mapillary but not by Street View. Figure

2-13c The relative completeness difference for the Vatnsendi neighborhood in the southeast of Reykjavik is -0.1 as opposed to the observed value of 0.91 for the whole

39

city. This is a result of the previously identified map lover user who systematically mapped the whole neighborhood in October 2014 (compare also Figure 2-11b).

Numerous orange tiles in Figure 2-13d for Malmö demonstrate the better coverage of Mapillary over Street View for off-road segments in the city as a whole.

Figure 2-13e illustrates for San Francisco that even in urban areas where Street View captures a high number of off-road features, Mapillary can provide better coverage in selected districts.

Summary

This paper analyzed how users have contributed street level photographs to

Mapillary throughout the first year since the inception of this project. Assessment of data quality is especially crucial for VGI data sources since these data are not subject to official quality standards posed by regulatory agencies. Mapillary data, like most other

VGI data, do not come with traditional measures of accuracy in their metadata. The paper assessed the completeness of Mapillary on main roads for countries from all over the world, and compared for selected study regions completeness values of Mapillary to those of Street View for three road categories. For all 11 analyzed local areas Street

View delivers better coverage than Mapillary with one exception for Malmö in combination with off-road features. However, both Street View and especially Mapillary imagery is not evenly distributed within study sites, so that regional pockets exist where

Mapillary coverage outperforms that of Street View in the various analyzed road categories. These pockets could either indicate a heightened interest from users in mapping a specific area, or a gap in Street View coverage. The presented analysis method, does, however, not allow to distinguish between these two cases.

40

The Mapillary service experiences a high data growth rate. Although one cannot predict future growth rates, the analyses identified a strongly committed group of users that contribute data on a regular basis after joining the Mapillary project. The distribution of contribution related measures (days of active mapping, mapped kilometers per week, etc.) follow closely the power law function, reflecting distribution inequality. This kind of participation inequality matches contribution patterns observed for other VGI data sources, and can be explained by different motivations for different kinds of contributors.

For example, Budhathoki and Haythornthwaite (2013) distinguish between lightweight

(casual) and heavyweight (serious) mappers in OSM. More specifically, a casual mapper’s motivation is more oriented towards general principles of free availability of mapping data, whereas serious mappers are primarily motivated by gaining knowledge and advancing their career. Analysis of the radius of gyration reveals that Mapillary is not limited to local activities, but that some users travel and map locations far apart. In fact, 18% of all contributors have a radius of gyration greater than 100 km.

Availability of Mapillary imagery through the ShareAlike license can facilitate various research efforts that require such imagery (e.g. wheelchair routing), as well as the development of location based services (e.g. virtual tours of hiking trails). Extraction of traffic signs through computer vision algorithms can become helpful for managing transportation utility inventory. Despite these applications, due to the crowdsourced nature of Mapillary, usability for this kind of purpose is limited to focus areas only.

For future analysis, we plan to analyze how OSM contributors use Mapillary imagery as a data source. Mapillary photos are already included in the official OSM editor so that OSM mappers have access to street level photos. Tags in the source field

41

of OSM features indicate that OSM mappers already use Mapillary imagery to add map features, such as bus stops, which can be visually identified on Mapillary images. It will also be of interest to analyze whether the same group of volunteer mappers contributes to OSM and Mapillary or if a new crowd of voluntary mappers are reached via Mapillary.

42

Figure 2-1. Mapillary image and mapped GPS trace as seen on the website as of 2016

Figure 2-2. Geographic units (tiles) in the WMTS scheme illustrating tile coordinates (X, Y) and zoom levels

43

Figure 2-3. Removing segments longer than 1km (dashed lines) from the Mapillary dataset and retaining only those that represent real coverage (green lines)

(a) (b)

Figure 2-4. Original Street View tiles at zoom level 13 lacking details of the road network (a), and a regenerated tile allowing to distinguish between individual roads (b)

44

Figure 2-5. Keeping most information from tiles intersecting with country borders by using a refined scale. Within intersecting tiles (red outline), only a few finer resolution tiles with a spatial resolution of ~40m were excluded from the completeness analysis

Figure 2-6. Number of users per continent contributing to Mapillary

45

Figure 2-7. Cumulative data growth of Mapillary. Total distance mapped per continent (a) and country (b), and average distance mapped per user in continents (c) and countries (d)

(a) (b)

Figure 2-8. Mapillary sequences in Australia/Oceania (a) and in the Iberian Peninsula (b) by home country of users

46

Figure 2-9. Power law approximations of contribution patterns for individual users (a-d) and contribution span of users (e)

47

Figure 2-10. Convex hulls of contributions from individual users revealing important travel corridors for Mapillary users

(a) (b)

Figure 2-11. Example contributions of different mapper types: Casual mapper recording a single hiking trail (a) and map lover systematically recording a neighborhood (b)

48

Figure 2-12. Completeness of street level Mapillary images on main roads

(a) (b) (c)

(d) (e) Figure 2-13. Spatial distribution of relative completeness difference between Mapillary and Google Street View for selected cities

49

Table 2-1. Most complete administrative units in terms of Mapillary completeness Administrative Length of main Mapped main Completeness Number of unit road system road distance [%] users [km] [km] Barbados 21 11 52.6 2 Hong Kong 213 110 51.5 2 Nicaragua 1,365 512 37.5 6 Slovenia 1,449 456 31.5 9 Netherlands 4,770 1,426 29.9 49 Germany 42,893 11,452 26.7 351 Croatia 4,363 874 20.0 15 Sweden 18,124 3,534 19.5 301 Austria 5,382 1,001 18.6 49 Switzerland 3,554 608 17.1 49

Table 2-2. Completeness of Mapillary (Map) and Street View (SV) together with relative completeness difference (Rel. Diff.)

OSM main OSM residential OSM off-road Rel. Rel. Rel. Study sites Area [km2] SV Map Diff. SV Map Diff. SV Map Diff. Urban: Los Angeles, CA 885 99.3 10.1 0.81 93.5 1.6 0.97 7.3 4.1 0.27 Ft. Lauderdale, FL 279 98.9 24.1 0.61 88.4 1.3 0.97 8.3 0.0 1.00 Downtown Miami, FL 97 99.6 40.0 0.42 91.6 6.8 0.86 7.0 2.6 0.45 New York City, NY 190 99.0 19.1 0.68 95.5 6.6 0.87 40.4 2.4 0.89 San Francisco, CA 447 99.5 37.1 0.46 92.6 4.8 0.90 15.0 1.9 0.78 Avg. 99.2 21.8 93.0 3.2 15.9 3.2

Rural

Bellingham, WA 2,569 75.3 47.9 0.22 26.2 6.9 0.58 4.0 0.6 0.75 Eldersburg, MD 2,451 87.4 41.0 0.36 53.7 4.2 0.85 1.0 0.5 0.34 Avg. 83.2 43.7 41.8 5.6 2.1 0.5 Mixed Malmö, Sweden 409 97.1 58.0 0.25 75.0 20.4 0.57 7.7 10.3 -0.14 Minneapolis, MN 4,126 97.6 20.8 0.65 81.6 1.3 0.97 11.5 0.6 0.90 Reykjavik, Iceland 905 96.1 25.3 0.58 82.1 3.8 0.91 5.4 0.7 0.76 Washington, DC 912 98.9 26.7 0.58 89.8 4.0 0.92 7.7 1.4 0.69 Avg. 98.0 25.8 84.6 3.4 7.3 2.6 Total average: 95.8 28.0 78.5 3.8 7.6 2.3

50

CHAPTER 3 CROSS-LINKAGE BETWEEN STREET LEVEL MAPILLARY PHOTOS AND OSM EDITS

In recent years, technological developments in computer, sensor, and communication technology together with an increase in citizen’s interest in sharing spatial information led to a significant growth of crowd-sourced geographic information, often referred to as Volunteered Geographic Information (VGI) (Goodchild 2007b), which became accessible on Web 2.0 platforms and social media. Contribution patterns for individual VGI applications, such as OpenStreetMap (OSM), photo sharing services, or drone imagery portals, have already been extensively analyzed in the geoscience literature (Neis and Zielstra 2014; Hollenstein and Purves 2010; Hochmair and Zielstra

2015). However, it is less understood if and how users participate in several crowd sourcing platforms, whether an individual contributor’s activity in different VGI platforms are spatially co-located or spatially distinct, and how data are cross-linked.

Mapillary provides a crowd sourced alternative of street-level photographs to

Google Street View. Since its public launch in February 2014 members of this project have so far provided more than 45 million street level photographs along a total of

1,250,000 km of roads and off-road paths. Besides being a crowd-sourced and therefore free alternative to Google Street View, Mapillary has also the advantage that its users can take photographs with a smartphone and upload them with an app, without the need of professional camera equipment. This makes Mapillary particularly suitable for image collection on off-road paths, such as hiking or cycling trails. Since street level photographs provide supplemental information to other free alternative data sources, such as aerial photographs, satellite imagery, or census data, they are beginning to be used in other VGI platforms. For example, source tags of OSM edits indicate that

51

Mapillary imagery is already used to edit OSM features. Mapillary image content can be used to identify features or feature attributes that cannot be seen on the aerial imagery but are visible on ground level photos only, such as names of bus stops or buildings.

Therefore, Mapillary provides “local” knowledge for OSM remote mappers who do not map in the field. Mapillary imagery is included as a layer option in two of the common

OSM editors (iD and JOSM), making it easier for mappers to use street level photos for data editing in OSM.

The overall objective of this study is to determine to what extent Mapillary imagery is currently used for OSM feature editing. More specifically, it aims to determine how the use of street level photos for this purpose varies between different parts of the world, which OSM features are primarily mapped and edited based on Mapillary imagery, how spatially distinct OSM and Mapillary contributions for an individual mapper are, and how cross-linkage to Mapillary is provided in OSM, i.e. which tags are used to reference this connection. This topic is relevant for VGI research since we assume that cross-linkage between different VGI data sources can improve data quality, both by increasing completeness of linked data sources, but also through quality control and review of the original data source, such as the location of an image in the linked platform.

The remainder of this chapter is structured as follows: The next section provides a review of related literature, which is followed a description of the study setup. After this, results of the analysis are presented, which is followed by conclusions.

Literature Review

Understanding user contribution patterns is important in the context of spatial data quality of VGI (Coleman, Georgiadou, and Labonte 2009b; Budhathoki and

52

Haythornthwaite 2013). Therefore, numerous studies examined data growth patterns for various crowdsourced geographic data platforms and identified mapper types and mapping communities. The literature review section will focus on community and cross- linkage analysis in previous studies, where OSM is one of the most frequently analyzed

VGI platforms. One study analyzed community development in OSM between 12 selected urban areas of the world and found that for cities with lower OSM community member numbers a significant percentage of OSM data contributions (up to 50%) came from mappers whose main activity area was more than 1000 km away from these particular urban areas (Neis, Zielstra, and Zipf 2013). The interaction between users in

OSM was measured for seven selected cities in Europe, the United States, and

Australia, by analysis of co-editing patterns in OSM (Mooney and Corcoran 2014).

Results showed that high frequency contributors, so called senior mappers, perform large amounts of mapping work on their own but do interact, i.e. edit and update contributions from lower frequency contributors as well. A related study analyzing the

OSM editing history for London revealed that there was limited collaboration amongst contributors with a large percentage of objects (35%) being edited only once or twice

(Mooney and Corcoran 2012b). An earlier AGILE workshop focused on the activities and interactions which occur during VGI collection, management and dissemination on various VGI platforms (Mooney, Rehrl, and Hochmair 2013), including the semantic aspect of the integration of VGI datasets with authoritative spatial datasets. Community analysis among contributors was also conducted for other crowd-sourcing and social media platforms. For example, a field experiment on the online encyclopedia Wikipedia showed that informal rewards (e.g. a thumbs-up) increase the incentive to continue

53

contributing only among already highly-productive editors, but lower the retention of less-active contributors (Restivo and van de Rijt 2014). Another study on Wikipedia identified collaboration patterns that are preferable or detrimental for article quality, respectively (Liu and Ram 2011). For example, articles with contribution patterns where all-round editors played a dominant role were often of high quality. Analysis of the network of Twitter users and their followers shows that, although users can connect to people all over the world, the majority of ties from US based users are domestic

(Stephens and Poorthuis 2015). That is, the Twitter network in the US is spatially constrained and bound by national borders and population density. The connection between the Twitter online and the underlying real world geography is also discernable by the fact that 39% of Twitter ties are shorter than 100 km, i.e. roughly the size of a metropolitan area, and that the number of airline flights emerges as a better predictor of non-local twitter ties than spatial proximity (Takhteyev, Gruzd, and

Wellman 2012).

Although contribution patterns and specifically contributor communities within various VGI and social media platforms have been recently discussed, as described above, comparison of contribution patterns of individual users across platforms and data-linkage across platforms has so far not been analyzed in great detail in the literature. Some studies do compare the density and spatial footprints of data contributions between different VGI and social media platforms, such as between Flickr and Twitter (Li, Goodchild, and Xu 2013), or between several photo sharing services

(Antoniou, Morley, and Haklay 2010). However, these studies do not analyze individual contributor behavior or discuss how contributions of one data source are related to

54

contributions from another. Recent trends show that linkage of geographic data across different VGI and social media platforms is a real phenomenon. For example,

FourSquare/Swarm users use an OSM background layer to add new check-in places.

Therefore OSM positional accuracy directly affects the positional accuracy of

FourSquare/Swarm venues. Flickr, a prominent photo-sharing service, has about

30,000 photos tagged with OSM objects. These so-called machine tags potentially allow machine algorithms to automatically extract descriptive information from OSM for Flickr photos. Mapillary uses OSM for reverse geocoding as well. That is, for each photo and photo sequence, the name of the corresponding road is determined by the OSM

Nominatim geocoder tool which provides descriptive information of the image locations.

In turn, Mapillary photos can be used as a source to derive information for OSM mapping purposes.

It has been shown that OSM positional accuracy is better where high-resolution imagery is available (Haklay 2010). Also, data imports to OSM have clear benefits for areas with a smaller contributor base (Zielstra, Hochmair, and Neis 2013). Since the

Mapillary licensing policy allows OSM contributors to derive information from its imagery, it is a valuable source of geographic information for OSM and other VGI platforms. Within its first year, Mapillary reached significant coverage in some selected cities, and even outperformed Google Street View in terms of completeness in some cases (i.e., in some rural areas and on some off-road segments) as of early 2015

(Juhász and Hochmair 2016b). This explains why a growing number of OSM users utilize Mapillary data for OSM data editing, which will be more closely analyzed in the remainder of the chapter.

55

Study Setup

The study is split into two parts. The first part uses worldwide data and is conducted at the aggregated level to get insight into how OSM contributors cross-tag feature edits and changesets to express the Mapillary source. It analyzes also which

OSM primary features (e.g. highways or amenities) are mostly associated with

Mapillary, and whether Mapillary information is used to create new features or to edit attributes of existing features. The second part of the study reviews in more detail the mapping behavior of individual users. More specifically it analyses for users that contribute to both platforms to what extent the areas they map in OSM and Mapillary overlap, and whether one of the two VGI platforms is preferred over the other as a mapping platform. This second part of the study is conducted for Europe.

Following the study design, the data retrieval process is also separated into two parts. The first part extracts from an 11 week period all OSM feature editing events and changesets worldwide that are associated with Mapillary according to their tags. The second part identifies, based on user names, users who contributed to both platforms in

Europe. For these users, geometries of OSM changesets and edited or added OSM features, as well as Mapillary photo locations are extracted for subsequent comparison.

Extraction of Mapillary Related OSM Events

For the extraction of Mapillary related OSM events we used OSM diff files. These files contain all changes made to the OSM database over some time period and can be downloaded at different time granularities from the OSM Website in the compressed

OsmChange format. In addition to the daily summary of changes in OSM diffs, we considered also OSM changesets. All these data were extracted between August 31,

2015 and November 15, 2015, covering an 11-week period of OSM edits.

56

We relate to an OSM event as an insertion or modification of an OSM feature that has an explicit reference to Mapillary. Such references are usually source tags, descriptions, comments or URLs. We consider an OSM event Mapillary related if the expression “Mapillary” (or “mapillary”) can be found in a reference. A software tool, which we developed in Python and Bash, starts with downloading a diff file or changeset. After decompressing the file, the tool converts it into the OPL1 format with the Osmosis tool. The resulting text files contain OSM edits in rows. Therefore it is possible to search and filter edits with UNIX grep commands. If a line (event) is associated with Mapillary (i.e. “Mapillary” or “mapillary” keywords can be found in tag names or values), it is inserted into a spatially enabled PostgreSQL table. Node and changeset geometries can be reconstructed from the object itself. However, way geometries were extracted from the OverpassAPI. As a result of this process, separate tables for OSM nodes, OSM ways and OSM changesets were available for analysis.

Each table contains the unique OSM ID of the object (node, way, or changeset), username and ID of the OSM user that made the edit, tags in hstore format, and timestamps of the event.

Extraction of Mapillary and OSM Features from Across-Platform Users

For comparing the spatial editing and contribution behavior between Mapillary and OSM users, users from both platforms were extracted using string matching of usernames. To extract usernames, we used a Mapillary database dump of photo sequences that is a suitable representation of the spatial coverage of photo mapping

(Juhász and Hochmair 2016b). To reduce the chance of extracting two different users

1 OPL file manual - http://docs.osmcode.org/opl-file-format-manual/

57

who by coincidence share the same user name, only usernames that are longer than 7 characters were considered for this task. Next, it was checked whether the username from the Mapillary database dump exists also in the OSM database using the main API.

Since this is not the intended use of the API, we limited our search to 100 matches.

Then we reconstructed the OSM editing history of these OSM users using their changesets. A changeset contains the map edits and their bounding area that are submitted by a user to the OSM database, which is typically done on a regular basis to avoid losing completed edits. We limited OSM contributions to the time period after a user signed up to Mapillary, ensuring that both data sources cover the same time range.

Since changesets occasionally cover large areas, concealing details about a user’s primary regions of edits we excluded changesets larger than 225 km2. Using an exploratory approach we found that eliminating the upper tail of the area distribution

(Figure 3-1a) results in a fairly accurate spatial representation of a user’s OSM editing history, for which retained changesets are shown in (Figure 3-1b).

To spatially match OSM and Mapillary contributions, a 10 km by 10 km grid was created for Europe, limited to the region within the dashed boundaries shown in Figure

3-1b. For each cell, OSM and Mapillary edits were extracted for all users that were active in that cell. Results were stored in a PostgreSQL table with unique cell IDs, allowing to spatially match user contributions from both sources. Based upon examination of OSM and Mapillary contributions (areas, descriptions and timestamps) we identified one username which clearly did not refer to the same individual (e.g. editing OSM based on local survey while uploading Mapillary photos from a distant country at the same time). This user was removed from the dataset. The final dataset,

58

after limiting OSM contributions to after the Mapillary signup date and the geographic area to Europe, contained 83 individual users who uploaded photos to Mapillary, edited

OSM data, and were most likely the same person.

Results

Contribution Patterns for Cross-Tagged OSM Features

Cross-linkage between OSM event types and Mapillary

In a first step it was analyzed how and to which extent the OSM community uses

Mapillary as a source of information. The analysis was conducted for tags in Mapillary related OSM events, i.e. node and way edits, and, that explicitly mentioned “Mapillary” or “mapillary”. For OSM nodes, 1930 events were identified, consisting of new insertions or edits. These events occurred in connection with 1660 unique OSM nodes and were carried out by 68 unique users. For OSM ways, we found 1694 events relating to 1330 unique features that were edited by 96 individuals. Furthermore, the “Mapillary” or

“mapillary” keywords appear in 5110 changesets submitted by 209 mappers. The weekly aggregated number of events is shown in Figure 3-2. The number of users editing nodes or ways, or submitting changesets (smaller than 225 km2) with reference to Mapillary, together with the number of unique users per week are summarized inTable 3-1. The table shows for the different weeks also the total number of OSM users who submitted any changes. Among this group, the percentage of OSM users who submitted changes based on Mapillary images is shown in the last column. Values between approximately 0.5% and 0.6% indicate that the sub community that uses

Mapillary images for OSM data contribution is still a small fraction.

To avoid storing redundant information, OSM users oftentimes attach source information to the changeset rather than to each individual feature. This approach is

59

also recommended when editing multiple features in a mapping campaign. This explains the higher number of committed changesets and the higher number of changeset users with a reference to Mapillary compared to users associated with feature edits. However, it should be noted that not all edits in such tagged changesets are necessarily based on Mapillary alone, although the “Mapillary” or “mapillary” terms appear in the tag. For example, one changeset had a source tag value“bing”, referring to the available Bing imagery, accompanied by several comments, including “Added crossing from Mapillary and bing” or “Sidewalk + surfaces etc from bing, mapillary and local knowledge”. Analysis of changeset source tags revealed that 29% of identified changesets” rely solely on Mapillary, local knowledge and surveys, without indicating any other available sources, such as Bing or Mapbox imagery in OSM source tags. We checked also whether Mapillary images overlapped with cross-tagged OSM changesets and found that only 5% of these changesets were more than 50 m away from the nearest available Mapillary imagery. 84% of these changesets not located in the proximity of Mapillary imagery were created by the JOSM editor which does not reset the source tag when submitting a new changeset. Therefore these occurrences may be the result of this editor feature, and not of deliberately provided source information by the user. At least one changeset discussion confirms this2.

The spatial distribution of events (individual nodes, ways and changesets combined) is shown in the world map in Figure 3-3. Table 3-2 summarizes relative frequencies of event counts by continent together with user numbers. The map shows that in all regions where Mapillary is mostly contributed, i.e. in Europe and the United

2 OSM changeset – www.openstreetmap.org/changeset/35291204

60

States, Mapillary is frequently used as a data source for OSM edits as well. This is also confirmed by user numbers in Table 3-2, which are higher for Europe and the United

States (as part of North and Central America) than for other continents. Table 3-2 reveals that over 61 percent of all node edits and over 44 percent of all way edits in

OSM during the analyzed 11 week period occurred in Asia, which is surprisingly high given that the share of mapped tracks in Mapillary in Asia from all world contributions is only 4 percent as of the beginning of 2015 (Juhász and Hochmair 2016b). However, the user numbers for node and way edits in Asia are still much lower than those for Europe, which means that this pattern stems from a relative small group of OSM mappers that apply a source tag to edited individual OSM features rather than to changesets.

Contributions to North and Central America show that only very few mappers tag individually edited features (nodes, ways), but primarily tag changesets. OSM events cross-linked to Mapillary occur in all five continents. It should also be noted that the sum of users over aggregated continent data is greater than the number of users extracted from all nodes or changesets, which implies cross-continent mapping activities.

An analysis of tag distribution for OSM nodes and ways referencing Mapillary shows that the 10 most common tags, including the “source” tag, represent 60.4% of all tag occurrences. These can be described as power tags (Peters and Stock 2010;

Vandecasteele and Devillers 2015), i.e. tags used frequently by many users. The most common tag was “source”, which was attached to 1507 nodes and 1285 ways.

Cross-linkage for OSM primary features

The next step of the analysis examined the distribution of cross-linkages to

Mapillary for OSM primary feature categories. For ways and nodes, features from 21 out of the 26 primary feature categories showed a reference to Mapillary in our dataset.

61

Missing primary features are aerialway, boundary, craft, military, and office. Table 3-3 shows the most frequently used OSM primary features that were cross-linked to

Mapillary. The tags of these OSM features show a clearly different frequency distribution than that of the complete set of OSM features, which was extracted from

OSM Taginfo3. As an example, for node events OSM amenity features are frequently derived from Mapillary (44%) as opposed to only around 5% of amenity features that are present in the entire OSM dataset. For way events highway, leisure and barrier

OSM features referenced to Mapillary occur at a higher relative frequency than this is the case for the corresponding primary features in the entire OSM dataset.

In addition to primary features, 64 OSM features with a key “traffic_sign” that are cross-tagged with Mapillary (3.86% of nodes) were also found. This de facto tag is also related to transportation and often used outside the “highway=*” tagging scheme. With

Mapillary extracting and displaying traffic signs on their website, it is convenient to map traffic signs in OSM.

Surprisingly, some aeroway features, which fall into the category to map air travel related features, appeared in the OSM event list. Although this is outside the focus of typical street level imagery, the flexibility of Mapillary allows users to take and upload photos from virtually anywhere. As a result of this, some airport taxiways have been mapped on the London Heathrow airport based on the imagery (Figure 3-4a). Another innovative use of Mapillary that can be seen in the analyzed dataset is indoor mapping.

Since it is not possible to obtain GPS coordinates inside a building, postprocessing of images allows users to geolocate their imagery and upload it to Mapillary. The presence

3 OSM Taginfo - http://taginfo.openstreetmap.org

62

of an additional “indoor” OSM tag and negative “layer” and “level” values indicate object positioning through Mapillary indoor-imagery (Figure 3-4b). In fact, 191 nodes and 161 ways were tagged as indoor or below surface features. For better integration of

Mapillary images into the OSM tagging scheme, a new key called “mapillary” has also been introduced to the OSM community, which allows mappers to reference the corresponding Mapillary image in the OSM feature key. A new initiative, OneLevelUp, already renders this information to a web map. Users also tend to use namespaces, indicating from which direction a Mapillary photo shows the object in question (e.g.

“mapillary:NE”). In addition, street level imagery provides the ability to capture descriptive information of features, such as the name of a business (Figure 3-4c), the surface type of a road or the material of street furniture. Furthermore, the crowdsourced nature of Mapillary and the ability to capture the rapidly changing world is sometimes a helpful source to obtain an update on geometry information, such as on a modified road layout (Figure 3-4d). Interestingly, OSM features highlighted in Figure 3-4d do not have a source tag indicating Mapillary, but the following note assigned to them:

PLEASE DO NOT EDIT if you don't live here. Roads have been completely reconfigured. High-zoom-level imagery is out-of-date (low zoom level imagery is correct). Consult Mapillary.com sequences for this area to see correct road configuration.

OSM activity types associated with Mapillary

In another step the version numbers for edits of individually edited features in

OSM that were cross-linked to Mapillary were extracted. This provides information about whether features were newly created (version number 1) or modified (version number > 1). A summary of these activity types is provided in Table 3-4. The large number of edits with a Mapillary reference (last three columns) suggests that street level

63

imagery is used not only to create new features but also to edit existing ones (e.g. to add descriptive information). The table distinguishes between edits applied to nodes that were created during the 11 week analysis method based on Mapillary (left part), and edits applied to nodes that were created before that period or without reference to

Mapillary (right part).

Across-Platform User Contributions

For analysis of individual mapping behavior across the two VGI platforms we extracted Mapillary and OSM contributions of 83 individual users identified earlier. This analysis was conducted for Europe (see Figure 3-1b). To analyze whether mapped areas of edits are co-located or spatially distinct, for each user, the percentage of 10 km by 10 km tiles mapped only in OSM, mapped only in Mapillary, or mapped in both platforms was computed. Results showed that 93% of users mapped at least some areas in both platforms, resulting in an overlap (Figure 3-5a). Even though the sampling of users analyzed for this part of the study started with extracting users from Mapillary, the diagram shows that the majority of users focuses more on OSM (blue area) than on

Mapillary (green area) in their data collection efforts. For five users, the exact same tiles are mapped both in OSM and Mapillary. Figure 3-5b highlights the spatial differences for a selected user in Northeastern Germany, showing that urban areas tend to be mapped both in OSM and in Mapillary, whereas rural areas are predominantly mapped in OSM only. The latter may change once urban areas become more completely mapped and thus saturated in Mapillary, so that mappers need to divert more towards rural areas for additional Mapillary contributions. Areas of Mapillary-only contributions can be found along selected major roads (e.g. highway bypass of Berlin). This bypass was already mapped in OSM, but provided a novel contribution option to Mapillary.

64

Mapillary requires users to be physically present at mapping locations, while editing

OSM remotely is a common practice. This might be a reason behind OSM contributions being more spatially spread for this user.

Curves in Figure 3-6 show which percentage of users mapped at least a given percentage of the total mapped area (constructed from OSM and Mapillary tiles combined) in OSM, Mapillary, or in both. For example, the leftmost values mean that

100% of the users mapped (at least) 0% of the area in OSM, Mapillary or both. Moving further to the right, one can see that 75% of the users mapped OSM in at least 50% of their combined areas, that 48% of users mapped Mapillary in at least 50% of their areas, but that only 10% of users mapped at least 50% of overlapping areas, i.e. in

OSM and Mapillary. Furthermore, the diagram shows that 2% of users have a 100% overlap in mapped areas.

Summary

The first part of the study analyzed how Mapillary street level photographs are incorporated and cross-linked in OSM by matching the Mapillary keyword to tags in

OSM edits and changesets. Results showed that even during short period of time

(August 31 – November 15, 2015), Mapillary images have been used to edit OSM features. It was found that overall Mapillary is most frequently associated with changesets rather than with individually edited features, although the share of OSM events (nodes, ways, changesets) being that are cross-tagged with Mapillary varies between the continents. The predominant tagging of changesets might be the result of batch changes in order to avoid the tagging of redundant information with individual features.

65

The geographic focus of Mapillary related OSM events corresponds to the core areas of Mapillary contributions, which are Europe and the United States. However, due to some local mapping activities, peaks in Japan and in Southeast Asia could also be identified.

The frequency distribution of cross-linked OSM primary features with a reference to Mapillary is significantly different from that of the entire OSM dataset. The percentage of cross-linked features compared to the entire OSM dataset is higher for transportation

(highway, public transport, traffic sign) and leisure (natural, amenity, tourism). This finding is in line with common activities associated with Mapillary, which are recording photos while commuting, traveling, and outdoor and leisure activities, such as hiking.

The crowdsource nature of Mapillary allows users to map OSM features in places where they are currently less frequently found, including airport taxiways or indoor objects.

Cross-linking the two data sources can also help to improve data quality. An example was given where a changed road network pattern was reflected in Mapillary photographs, which were then used to update OSM road geometries. Furthermore, the

Mapillary images provide a potential data source for adding OSM feature attribute information (e.g. surface type, name of business) without the need to conduct a field survey.

The second part of the study extracted areas of mapping activities from individual users who contributed both to OSM and Mapillary. The analysis revealed that an individual mapper is more likely to edit larger areas in OSM than in Mapillary. Despite this fact it could be observed that 93% of users in our sample mapped at least some areas that overlapped between OSM and Mapillary. The overlapping areas tend to be

66

located in locations where a user conducts frequent edits, for example in urban areas the user is familiar with.

67

(a) (b)

Figure 3-1. Histogram of filtered changeset areas (a) and accurate spatial representation of changesets after removing changesets with large areas in Europe colored by user (b)

Figure 3-2. Number of weekly OSM events cross-tagged to Mapillary

68

Figure 3-3. Spatial distribution of identified OSM events with reference to Mapillary

(a) (b)

(c) (d)

Figure 3-4. Using street level imagery in OSM: Mapping runway features (a), indoor mapping (b), deriving descriptive information (c), and deriving new road pattern (d)

69

(a) (b)

Figure 3-5. Ratio of mapped areas in Mapillary and in OSM (a), and spatial distribution of a user’s contributions (b)

Figure 3-6. Distribution of users by the size of area they mapped in OSM, Mapillary or both

70

Table 3-1. Weekly aggregated number of OSM users Users associated with Mapillary All OSM % OSM users users Week Node Way Changeset Unique (Mapillary) Aug 31 – Sept 6 (36) 10 12 55 67 11,701 0.63 Sept 7 – Sept 13 (37) 12 14 52 63 10,918 0.58 Sept 14 – Sept 20 (38) 13 11 47 56 10,476 0.53 Sept 21 – Sept 27 (39) 10 14 40 55 10,298 0.53 Sept 28 – Oct 4 (40) 9 9 38 49 10,108 0.48 Oct 5 – Oct 11 (41) 8 17 56 66 10,606 0.62 Oct 12 – Oct 18 (42) 9 13 48 59 10,270 0.57 Oct 19 – Oct 25 (43) 18 19 51 68 10,607 0.64 Oct 26 – Nov 1 (44) 17 20 52 69 10,872 0.63 Nov 2 – Nov 8 (45) 14 13 47 58 11,185 0.52 Nov 9 – Nov 15 (46) 7 15 41 54 11,305 0.48

Table 3-2. Identified OSM events with reference to Mapillary by continent Nodes Ways Changesets Event Event Event Continent (%) Users (%) Users (%) Users Africa 0.05 1 0.00 0 0.08 1 Asia 61.63 11 44.29 16 8.59 19 Europe 37.28 48 50.43 66 45.08 139 North and Central America 0.73 7 4.91 11 43.51 38 Australia and Oceania 0.16 2 0.25 1 0.50 5 South America 0.16 1 0.12 2 2.24 15 Total: 100 70 100 96 100 217

71

Table 3-3. Distribution of primary OSM feature categories cross-linked to Mapillary OSM features referencing Mapillary All OSM features Nodes # (%) (%) amenity 736 44.34 5.06 natural 199 11.99 6.34 highway 174 10.48 6.20 tourism 102 6.14 0.81 barrier 83 5.00 1.45 leisure 74 4.46 0.31 public transport 47 2.83 0.74 Ways highway 620 46.62 28.07 leisure 239 17.97 0.82 barrier 194 14.59 1.27 landuse 76 5.71 4.42 amenity 50 3.76 1.06 emergency 49 3.68 0.02

Table 3-4. Number of OSM features based on activity type Created during data collection with Mapillary Edited (created earlier reference or created without Total Not edited Edited Edited more Mapillary reference) further once than once Nodes 692 596 65 31 968 Ways 681 593 74 14 649

72

CHAPTER 4 HOW DO VOLUNTEER MAPPERS USE STREET LEVEL MAPILLARY PHOTOS TO ENRICH OSM?

Volunteered Geographic Information (VGI) (Goodchild 2007b) has been recognized as a valuable resource for the GIScience community to complement data from traditional sources, such as the census or aerial photographs. To assess the quality of heterogeneous VGI sources, studying contributor behaviour is essential

(Elwood, Goodchild, and Sui 2012; Budhathoki and Haythornthwaite 2013). Bégin,

Devillers, and Roche (2013) incorporated editing sessions (OSM changesets) in their analysis to better understand characteristics and quality of collected VGI. Results show that new changesets of a contributor usually extend or overlap spatially earlier changesets and add lower priority features or new attributes.

OSM is arguably the most widely studied VGI platform. It was shown that OSM positional accuracy is better when more mappers edit the same area (Haklay et al.

2010) and that users are more likely to edit a greater variety of features in their home region than in external regions (Zielstra et al. 2014). Whereas VGI mappers rely primarily on their local knowledge for data contribution and editing, incorporating other data sources can improve data quality. Examples include tracing of features from high resolution aerial imagery (Haklay 2010) or importing governmental data (Zielstra,

Hochmair, and Neis 2013).

Mapillary’s crowdsourced street level imagery is a unique addition to the list of available VGI sources. With the introduction of Mapillary in 2014 and the open license that it provides, OSM contributors can now use Mapillary image content to derive information that is not visible on aerial imagery (e.g. the type of a traffic sign) or to map features that would require in person exploration through field surveys (Juhász and

73

Hochmair 2016b). Evidence of OSM contributors that use Mapillary imagery to derive feature information was found by analyzing OSM edits that reference Mapillary in their tags (typically the source tags), which is referred to as cross-tagging (Juhász and

Hochmair 2016a). However, as tagging in OSM is inconsistent and contributors often follow tagging suggestions only poorly (Davidovic et al. 2016), any crowd-sourced data sets that were used for OSM edits (e.g. Mapillary, Flickr) may not be completely referenced in OSM tag content, calling for alternative methods to identify data use across different VGI platforms in data editing sessions.

This chapter analyzes the viewing extents of the Mapillary image layer during

OSM editing sessions with the iD and JOSM editors to estimate to what extent Mapillary images were likely used as a source for OSM edits.

Materials and Methods

This section provides an overview of the data sources and the data processing methods used in this research. The goal of the data extraction was to identify individual

OSM feature edits around the world that were likely based on Mapillary photos. Such edits would have to be made in the geographic area where and around the time when the user viewed the Mapillary image layer in one of the OSM editors.

Data Sources

OpenSteetMap

Since this chapter studies the editing behavior of volunteer mappers, a full OSM history dump1 was used which includes all historical edits ever made to the database.

Due to the large data volume, the pbf file was first split into world regions and then

1 http://planet.openstreetmap.org/pbf/full-history/

74

imported to a spatially enabled PostgreSQL database, using the osm-history-splitter and osm-history-importer tools

Mapillary

Recent versions of the iD and JOSM editors are capable of loading Mapillary images into their editing environments. These requests that originate from the editors eventually leave footprints on Mapillary servers, which can be expressed as a geographic area corresponding to the viewing extent of the editor. Mapillary provided us with a data dump of all viewing requests of the Mapillary layer together with their spatial extents and time stamps. For this analysis we used worldwide Mapillary viewing extent data that was collected between June 2015 and February 2016. In addition to this, another Mapillary data dump with individual photo locations was used to exclude OSM edits that are far from street level imagery and therefore probably not based on

Mapillary imagery. Mapillary viewing and photo location data was stored in the same

PostgreSQL database.

Data Preparation and Processing

Workflow

The size of a full OSM history dump of over 50 GB in the pbf format as well as millions of Mapillary viewing extents made it necessary to split the database into smaller tables corresponding to world regions. Also, custom indexes were constructed to speed up data extraction with SQL queries. The final database contained more than 5 billion rows (OSM edits and Mapillary viewing extents) and occupied approximately 1.7 TB of disk space.

Based on this customized database structure, a two-tiered data extraction approach was applied. The first step involved the extraction of OSM candidate features

75

through a coarse spatio-temporal match between image viewing windows and OSM edits.This reduced the database size for the second step, which involved more refined spatio-temporal overlay.

Extraction of editing sessions

We use the term “editing session” to describe an uninterrupted time period within which Mapillary images are being loaded into the OSM editor from the same machine as part of the layer viewing request. Since the server-logged viewing data does not contain a unique OSM user identifier, we used the IP addresses associated with each request to aggregate the viewing data over a set time period. More specifically, for each

IP address, a timeline was constructed that shows time stamps of user activities on the

Mapillary layer, such as changing the viewing extent. An arbitrary one-hour threshold of idle time (i.e. no image requests) was used to construct separate editing sessions from the timeline. This one-hour threshold corresponds to the time period after which the

OSM API closes changesets2 if no more edits are made. The example in Figure 4-1 illustrates how numerous individual Mapillary layer viewing extents (yellow rectangles) were aggregated into two distinct editing sessions (blue polygons with hatched areas).

These editing sessions have start and end timestamps and a geographic area that can be disjoint (which is not in the provided examples, though). This aggregation allows to reduce data volume without losing information about the editing activity.

Extraction of candidate features

Candidate features are OSM editing events (i.e., creation, modification) in the spatial and temporal proximity of Mapillary editing sessions. Since topological

2 https://wiki.openstreetmap.org/wiki/API_v0.6

76

operations and comparison of event timestamps are resource intensive, identification of a coarse set of candidate features uses the spatial and temporal index constructed in the PostgreSQL database. That is, instead of checking the specific spatial relations between OSM editing events and Mapillary editing sessions, the database was instructed to utilize related spatial indexes to determine a potential spatial overlap between candidate features and Mapillary editing sessions. Similarly, instead of comparing specific timestamps (with the precision of milliseconds), an index built on the day of editing events was used to identify OSM editing events and editing sessions taking place on the same day. The extraction of candidate features uses therefore a coarse comparison of spatial extents and event times, which results in an overestimation of candidate features compared to the number of actual potential OSM edits based on Mapillary images.

Extraction of edits likely based on Mapillary

Next, a more refined filtering method was applied on candidate features for data extraction. Within this step, only those OSM editing events were retained for further analysis that were conducted after the start time of a Mapillary editing session and that were completed within one hour of the session end time. Since submission of an OSM changeset is not automated (i.e. users need to send their changes in a separate step), this threshold is to allow some time in case users turned off the Mapillary layer before submitting their OSM changesets. In addition, candidate features further than 25m from the actual location of Mapillary photos were excluded. Figure 4-2 highlights the results of this filtering process. It shows an OSM edit that is considered to be Mapillary related3

3 http://www.openstreetmap.org/way/356400079/history

77

(yellow line) as well as other OSM candidate features (red lines) along with the location of Mapillary street level photos (green dots). In this example, the retained edit denotes a new highway exit added to OSM. The remaining candidate features overlapping with this session shown in red were excluded from the result set because they were either further than 25m from the imagery (see #1 in Figure 4-2) or they were added to OSM at a time that did not align with the editing session (see #2 in Figure 4-2).

Results

A total of 34,000 Mapillary editing sessions were identified between June 2015 and February 2016 out of which 8,400 contained only a single Mapillary viewing request. The latter means either that (1) the user accidentally turned on the Mapillary imagery layer in the OSM editor, or (2) that there were no images available for that area so that the user turned off the layer immediately. Editing sessions with only request were therefore excluded from further analysis. A Mapillary viewing session lasted for 7 minutes and 39 seconds on average. This is the duration users spent on mapping OSM features while viewing Mapillary street level photos in the OSM editor. The longest observed session lasted for 5 hours and covered a large area along a highway in

Belgium.

The popularity of Mapillary images used in the OSM editing workflow can be assessed from the number of editing sessions per week. Figure 4-3a shows this information for both analyzed editors. As can be seen, at the beginning of the study period (June-July 2015), the Mapillary imagery layer was only accessibly within the iD editor. It became available in the JOSM editor in August 2015 as well. The average number of sessions peer week was 283 for iD and 441 for JOSM (after it became available).

78

With the extraction of OSM edits based on spatial and temporal constraints described earlier, the number of OSM edits per week for both editors can be computed as well (Figure 4-3b). The figure illustrates the higher popularity of the JOSM editor compared to iD when it comes to Mapillary image use for OSM edits. On average, 400 feature weekly edits originated from the iD editor as opposed to 4100 coming from

JOSM. The clear preference for JOSM over iD was not expected, given that – at least – novice users use iD more frequently than JOSM to edit OSM data (Yang, Fan, and Jing

2016). During the most active week (starting on January 4, 2016) almost 10,000 OSM map edits were identified.

Figure 4-4 plots the number of different OSM users who use the Mapillary layer function for OSM feature edits within a given week. The number clearly increases after the layer functionality became available in JOSM in August 2015. 980 unique users were found to contribute to OSM based on Mapillary layer views during the study period.

To analyze the level of experience of OSM users who use the Mapillary layer service for feature editing, the sign up dates of these users were extracted from the main OSM API. Figure 4-4 shows the weekly number of analyzed OSM users by signup date. The bar chart suggests that novel users are quite active in utilizing the Mapillary layer feature. On average, 14% of weekly active users signed up to OSM within just six months before their editing activity. The proportion of weekly novice users ranged from

5% to 29%. When setting this limit to one month before editing based on Mapillary photos, this weekly rate is still 8% on average. More detailed analysis shows that almost

79

30% of all analyzed users created their OSM accounts after the introduction of Mapillary in 2014.

The histogram of user sign up dates to OSM supports this general finding (Figure

4-5). A clear peak of new users signing up at the beginning of 2015 suggests that photo-mapping does not require one to be overly experienced with OSM. This peak could be the result of a special promotion activity conducted by Mapillary. Several were organized to introduce Mapillary to wider audiences where Mapillary team members were present in multiple conferences and community events to promote the service. These promotions might have triggered a new crowd of mappers to sign up to OSM and to start with mapping from street level imagery information shortly after creating their OSM and Mapillary accounts.

Summary

This chapter examined to what extent OSM mappers use Mapillary imagery in their editing workflow. We used the viewing extents of Mapillary image requests submitted by the iD and JOSM editors, which provides a unique opportunity to study mapping behavior. These so-called editing sessions were spatio-temporally matched with a full history dump of the OSM database to extract those OSM edits that could be based on street level photos.

Although weekly counts of OSM feature edits based on Mapillary images are low compared to the number of all OSM feature edits submitted per week, our findings indicate that there is a certain group of OSM mappers who “cross-view” different VGI data sources for mapping purposes and, more specifically, use the crowdsourced

Mapillary imagery service to do so. Studying the sign up date of those mappers who engage in this activity also indicate that, although this process is more complex than just

80

drawing lines on top of aerial imagery, novice OSM mappers use Mapillary information, too, and provide valuable contributions by connecting these two VGI sources.

81

Figure 4-1. Editing sessions (blue hatched polygons) aggregated from individual viewing extents (yellow rectangles)

Figure 4-2. Retained OSM edit based on Mapillary (yellow), excluded candidate features (red), and location of Mapillary photos (green dots). Example #1 was excluded because of the lack of Mapillary photos in proximity, while example #2 was excluded based on temporal constraints

82

Figure 4-3. Number of OSM-Mapillary editing sessions per week, grouped by editor software (a), and number of unique OSM feature edits based on Mapillary per week (b)

Figure 4-4. Weekly aggregated number of OSM users using Mapillary to improve OSM

83

Figure 4-5. Histogram of sign up dates of OSM users engaging in photo-mapping. The peak in the right implies that newly registered mappers also engage in photo- mapping

84

CHAPTER 5 OSM DATA IMPORT AS AN OUTREACH TOOL TO TRIGGER COMMUNITY GROWTH? A CASE STUDY IN MIAMI

OpenStreetMap (OSM) is one of the most prominent Volunteered Geographic

Information (VGI) (Goodchild 2007b) projects to date that implements a collaborative workflow and aims to create a freely available map database of the whole world. VGI users in general, and in the case of OSM specifically, use a set of tools, such as field surveys, on-screen digitizing from aerial imagery and software to create verifiable information on the ground (Haklay 2013). The success of OSM is based on a large and active user base that interacts with other contributors, and validates and corrects errors made by them (Goodchild and Li 2012). OSM data is released under the Open

Database License1 (ODbL), which allows to freely copy, distribute, transmit and adapt the data as long as its source is credited. Derivative works, however need to be released under the same license. ODbL prohibits the use of copyrighted material (e.g. commercial maps) without explicit permission.

As OSM is a collaborative project, local contributors often organize social events

(so-called mapping parties) all over the world. These mapping parties are useful tools for building local communities since on-line contributors get to meet face to face and discuss their common interest. The aim is often to introduce OSM to new members through hands-on mapping sessions. These sessions can include joint field surveys

(e.g. to record house numbers) and data editing tutorials (e.g. to teach how to trace roads from imagery). Mapping parties are prime examples of social collaboration within

OSM (Haklay and Weber 2008). The effect of mapping parties on user and data growth

1 https://opendatacommons.org/licenses/odbl/

85

has been analyzed in various studies. For example, it was observed that during a mapping party, participants tend to edit more than usual (Hristova et al. 2013). This increased activity is more obvious for light and medium contributors than for heavy users. The latter could be due to the leading role of heavy users in organizing the mapping parties. Similar behavior can also be observed for another collaborative project, Wikipedia, where most committed users took up organizational roles (Bryant,

Forte, and Bruckman 2005). Another study described the organizational and planning aspects of a mapping party held in connection with a geospatial data conference

(Mooney, Minghini, and Stanley-Jones 2015). The organizers concluded that, although contributed data was of very high quality, on a wider scale the mapping party has not contributed a very large amount of data. One reason was that not all of the data collected during the field survey has been uploaded to OSM due to the lack of time during the mapping party, incomplete training, and lack of confidence in using OSM tools. Analyzing contribution patterns after a mapping party held in London, 50% of new

OSM members were found to stop contributions in the week after the event (Mashhadi,

Quattrone, and Capra 2015). A new study estimated that only 64% of new OSM contributors survive their first day, after which the estimated survival rate decreases

(Bégin, Devillers, and Roche 2017b), suggesting that the 50% withdrawal rate observed in (Mashhadi, Quattrone, and Capra 2015) is not specific to mapping parties. Apart from mapping parties, OSM shows other characteristics of a social project. For example, after the Haiti Earthquake in 2010, a new project called the Humanitarian

OpenStreetMap Team (HOT) emerged to generate freely available geographic data in areas affected by natural disasters (Soden and Palen 2014). As a response to that

86

event, over a 3-week period, 600 remotely located volunteer mappers built a base layer map for Haiti nearly from scratch. This map was then used on the field by response teams to save lives. In 2013, HOT evolved to a registered US non-profit organization2 that aims to create and provide free, up-to-date maps for relief organizations responding to natural and man-made crises. Their mapping efforts primarily use an online tool called the Tasking Manager (TM).

Besides field surveys and on-screen digitizing from remote sensing imagery,

OSM also allows the integration of other datasets available under licenses compatible to

ODbL (e.g. CC03). This usually triggers subsequent user contributions and edits of imported data. Permissible datasets include public domain data that is often published by government agencies. Importing data through automatic means is one of the most controversial topics within the OSM community as this method is different from the core approach of OSM, which is manually adding verifiable data to the map (Mooney and

Minghini 2017). However, the general consensus is that imports, if carefully executed, add value to OSM. The OSM community discusses import related issues in a dedicated channel4. Numerous OSM data import tasks have been executed so far ("OSM Wiki:

Import Catalog" 2017), and some studies have evaluated the effect of data imports on

OSM data quality and user participation. For example, one study described the effects of US Census TIGER/Line import on data completeness (Zielstra, Hochmair, and Neis

2013), revealing that the OSM community does not focus on improving already imported

2 https://www.hotosm.org

3 https://creativecommons.org/share-your-work/public-domain/cc0/

4 https://lists.openstreetmap.org/listinfo/imports

87

road segments, but rather on other features, such as pedestrian paths, which are connected to imported features. Challenges associated with the matching of tags between imported data sources and the OSM tagging structure are discussed in another study (Mooney and Corcoran 2012a). Touya and Brando-Escobar (2013) discuss level of detail inconsistencies in VGI data and found examples of OSM data imports causing this problem. For example, buildings imported from the French cadastral sources can overlap with land parcels imported from the European CORINE

Land Cover dataset because of the different scale of those data sources. OSM data imports can be beneficial for the data donor as well. For example, the Department of

National Resources of Canada allowed the OSM community to import their national dataset with the hope that the community would further improve it. These improvements

(upon approval) could then be fed back to the national dataset (Beaulieu, Bégin, and

Genest 2010).

This chapter describes experiences made with a local OSM data import in Miami-

Dade County, Florida. More specifically, it evaluates how effective a building import task is in engaging different targeted community groups in OSM participation, which will provide useful insights for the design and implementation of future data imports. The import task of this study integrates a public domain dataset (building footprints) in

Miami-Dade County. Targeted communities were asked to join the import project and to manually edit OSM data in the hope that this mapping experience would trigger a community growth. These targeted groups were 1) existing members of the OSM community contacted through the OSM site, 2) users reached by Maptime Miami, a local chapter of an initiative built around open knowledge and geospatial technologies,

88

and 3) students enrolled in two courses of the Geomatics Program at the University of

Florida who were introduced to the import task as part of their course work. The success of VGI projects generally depends on participants, therefore understanding their motivation is key for the success of such projects (Fritz, See, and Brovelli 2017).

The motivation of VGI users can be divided into two distinct categories: extrinsic motivation that is related to outside factors (e.g. receiving compensation), and intrinsic motivation that originates directly from the user (e.g. gaining new knowledge or recreation) (Budhathoki and Haythornthwaite 2013). Our different outreach techniques are expected to reach users with different motivations.

OSM building imports are a useful means to improve the quality of the OSM building layer, which is still low compared to that of road network data. The completeness of buildings mapped in OSM relative to official data from national mapping and cadastral agencies has been examined in several studies. It was found that completeness levels vary widely between different cities (e.g. between 12.1% and

48.1% for selected parts of Germany (Hecht, Kunze, and Hahmann 2013), when considering the number of buildings using the unit-based method, and between 30% and 75% in three cities of the United Kingdom (Fram, Chistopoulou, and Ellul 2015), when implementing a cell based completeness approach. Completeness evaluation and positional accuracy assessment was also performed for Milan, Italy (Brovelli et al.

2016), which revealed a decreasing trend in completeness from the city center towards the outskirts. Positional accuracy was found to be similar across the city, probably due the constant accuracy of the underlying imagery, from which buildings were traced. It is worth mentioning, however, that calculated completeness values differ strongly with the

89

applied method. One study measured building completeness for a medium-sized

German city with two common unit-based methods and found that the count ratio method underestimates, whereas the area ratio method overestimates building completeness (Törnros et al. 2015). The authors also suggest pre-processing steps to mitigate these effects. Another study examined completeness, semantic accuracy, position accuracy, and shape accuracy of OSM building footprints for Munich, Germany, revealing a high completeness and semantic accuracy, whereas in terms of shape some architectural details are missing (Fan et al. 2014). Barron, Neis, and Zipf (2014) proposed an intrinsic approach for OSM quality assessment. During their investigations in Madrid (Spain), San Francisco (USA) and Yaoundé (Cameroon), they found that the cumulative number of attribute information attached to buildings (address or name) does not correspond well to the number of buildings (e.g. when the number increases significantly due to a bulk import).

Materials and Methods

Miami-Dade Large Building Import

On May 16, 2016, Maptime Miami5 proposed an OSM data import of Large

Building Footprints6 from Miami-Dade County’s Open Data repository to kick-start

Miami’s OSM7, which lacks behind other major cities in the United States both in terms of contributor numbers and data completeness. To ensure that a data import is not harmful for OSM, such projects need to adhere to strict guidelines, which include local community buy-in, announcements on different OSM channels (Wiki, mailing lists) and

5 https://www.meetup.com/Maptime-Miami/

6 http://gis.mdc.opendata.arcgis.com/datasets/1e87b925717747c7b59979caa7779039_1

7 http://wiki.openstreetmap.org/wiki/Miami-Dade_County_Large_Building_Import

90

the ability for the community to review and test both the data to be imported and the methods used during the import process. These guidelines were followed and the import was discussed within the US OSM community.

Other building imports, such as the ones in Los Angeles and New York City relied exclusively on the OSM community and therefore required many active contributors to manually review and import buildings. Due to the low number of OSM contributors in South Florida and the lack of existing buildings, a hybrid approach was implemented that consisted of an automated bulk upload of buildings and a manual community review of remaining buildings where needed. For this purpose, a software tool8 was developed and open sourced to pre-process the building dataset, to perform quality checks and to separate the dataset into two parts. Hence, one part of the dataset was uploaded automatically, and the other one was set aside for the community to review. The latter set contained buildings with detected conflicts (overlap with existing

OSM buildings, road or railroad features, geometry errors, etc.) whereas the rest of the dataset (i.e. buildings with no geometry issues and no overlaps) was uploaded automatically from a dedicated import account9 with upload scripts. A description of the workflow and the software tool is available at https://github.com/jlevente/MiamiOSM- buildings. In Figure 5-1a features in green represent buildings in the automatic bucket

(i.e. no conflicts) whereas those in red represent buildings for the community review process. The dataset consists of 95,536 large buildings that are defined as structures in an apparently commercial, industrial or other non-residential area. Additionally,

8 https://github.com/jlevente/MiamiOSM-buildings

9 https://www.openstreetmap.org/user/MiamiBuildingsImport

91

structures larger than approximately 750 m2 (e.g. townhomes, condominiums) are also classified as large buildings. The dataset was derived from high resolution aerial imagery by both automatic photogrammetric methods and manual digitization and is available for download unprojected. After visual inspection, the building layer was found to be of high quality. Buildings also contained building height information in feet, which was converted to meters and also imported with the geometry. Additionally, an address dataset was spatially merged with the buildings to provide accurate street level information along with the buildings. A total of 84,348 were uploaded automatically

(green features in Figure 5-1a), which left nearly 11,000 buildings for the manual review process (red features in Figure 5-1a). All import buildings were tagged with the

“ref:miabldg”10 key for easy identification.

The remaining 11,000 buildings were split by US Census Block Groups to provide a manageable number of buildings for manual review. A custom workflow was developed and explained in a detailed tutorial (Figure 5-1b). The workflow uses the

JOSM editor since it can load multiple datasets and provides superior tools for data editing compared to the Web based iD editor (Juhász and Hochmair 2017a). The tutorial used screenshots, explanations and specific instructions detailing how to execute the import steps. The tutorial was tested by multiple members of Maptime

Miami before releasing it to the public. To administer the progress and to provide a central interface for import users, a dedicated Tasking Manager (TM) instance developed by the Humanitarian OSM project was set up at http://tasks.osm.jlevente.com. A screenshot of the interface is shown in Figure 5-1c. On

10 http://wiki.openstreetmap.org/wiki/Key:ref:miabldg

92

this site, contributors can log in with their OSM username and then select a block group within Miami-Dade County to work on. Once a user selects an area, it will be locked for an hour to avoid concurrent edits. This lock is visible on the website for other users currently browsing the site. The TM instance contains hyperlinks to the tutorial and provides an easy way to load data into JOSM. For example, by pressing a button in TM,

JOSM on the user’s computer loads data from the selected area and zooms to the extent of the import area.

The general steps of the import workflow are as follows:

1. User logs on to TM

2. User selects and locks an aera to work on

3. In TM, user loads current OSM data coverage into JOSM

4. In TM, user loads import building dataset into JOSM

5. In JOSM, user merges the import and OSM datasets into one single layer

6. In JOSM, user works on resolving conflicts and refers to the tutorial if needed

7. In JOSM, user runs the validation tool to ensure all data is correct and is ready to upload

8. User uploads data to OSM

9. User marks TM task as “done” or unlocks it if task is not finished

Outreach Techniques and Target Audiences

To reach a sufficient number of contributors for the project, different user groups were targeted and introduced to the import. This allows also to explore the willingness of different user groups to participate in this import and to better understand the impact different user groups make on OSM.

93

Students

The import project was introduced at two courses in the Geomatics Program at the University of Florida, which are GIS Programming (Fall 2016, graduate level) and

GIS Analysis (Spring 2017, undergraduate and graduate level). Participation in this study was voluntary and students received extra credit to complete this task. Both courses are offered in an online format and therefore, students were located in different parts of Florida. In both courses, a lecture was dedicated to the import where students were given an overview of the import task and received information about the available resources (tutorial, TM, etc.). A hands-on editing session that illustrated the import process in detail was demonstrated live and also recorded to make it available for review later on. To earn full extra credit, students were asked to import at least 50 buildings.

Before the submission deadline, seven students needed assistance and troubleshooting associated with the assignment. Encountered problems included technical issues with JOSM and fixing errors in the submitted OSM edits. The early edits of a few students contained some building outlines traced from aerial imagery, but without the “building=yes” tags. These students did not realize the importance of tags in the first place, and were asked to fix their edits so that the added buildings would be recognized as buildings in OSM. In one instance, some changesets that only contained overlapping buildings needed to be manually reverted. This was due to missing steps 7 and 8 described the “Miami-Dade Large Building Import” section from a student’s side.

Existing OSM community and local community

On August 1, 2016, the 50 most active mappers in Miami-Dade County between

March 2015 and August 2016 were contacted through the OSM messaging system. A

94

contributor’s activity was measured by the total number of edits (including geometry changes, feature additions, modifications and deletions as well) observed in all changesets of the user whose centroids were located in Miami. Since messaging involved navigating to these user’s profiles, automatic filtering of bots was not necessary. Two of the original top 50 accounts were removed from the list as one was found to be a bot, and the other one was the first author’s OSM profile. These users were replaced with OSM users originally on the 51st and 52nd place. Figure 5-2a shows the spatial distribution of OSM changesets (cyan transparent rectangles) around Miami-

Dade County within this period, where the most active areas correspond well to the

Miami metropolitan area. Based on the toplist, an introductory message was sent to users, letting them know about the import and listing all the resources (chat room, code repository, tutorials, Meetups). It was assumed that the most active local mappers could be reached with this method.

In addition to messaging OSM users, Maptime Miami was organizing meetings on the Meetup platform almost every month. These Meetups were announced in different social media platforms (Facebook, Twitter, Meetup) and promoted by Maptime

Miami and other Miami based community organizations, such as Code for Miami and

Venture Café Miami. These meetups were organized around the import process, reporting its current status and most of the meetups included an interactive session where organizers helped new users getting started with OSM and importing buildings

(Figure 5-2b). Between August 1 and December 31, 2016, a total of four meetups were dedicated to the import project.

95

Since OSM implements a free tagging system there is no control over how users indicate that their edit is related to the Miami-Dade Large Building Import project (if they indicate it at all). To identify users who directly interacted with this import process, we gathered usernames from three different sources. First, all users were extracted from the TM instance. Since this is the interface where users can download the import dataset, the users who contribute to this project according to the provided tutorial will show up in this list. Another possible use case is when a user, instead of going through the import process, finds out about the project through editing OSM and chooses to improve the buildings that have been imported so far. These users tend to be more experienced and can be identified by analyzing a history dump and extracting all new features that match the “ref:miabldg” tag. There is another way for users to contribute to the import task without showing up in the TM or in the history dump. Namely, users could indicate the import process on the changeset level without marking individual features (Juhász and Hochmair 2016a, 2017a). For participants, the JSOM editor automatically populated the changeset comment field with the #miabuildings which makes it possible to query these edits later.

For the remainder of the chapter, users described in this section (direct messages, gathered from TM or history and changeset dumps) are referred to as community users to resemble their role in the import process. Besides these two targeted groups, there will also be other OSM members that are not directly involved with outreach activities described before, but who instead edit already imported buildings, e.g. by adding more attributes or refining building outlines.

96

Results

Participation Numbers

Students

Overall, 16 student submissions were received (Fall 2016, GIS Programming:

1/15 graduate students; Spring 2017, GIS Analysis: 14/26 graduate students, 1/3 undergraduate students) and 15 of them received the extra credit offered for the assignment. The difference in student activity between both courses might be due to the different nature of these classes as a programming class is rather technical and the focus is not on data sources and data analysis. The participation rate in the GIS

Analysis class of 51% is a little higher than participation rates in extra credit activities in other studies with participation rates below 40% (Elicker, McConnell, and Hall 2010;

Padilla-Walker et al. 2005), which might be due to the online nature of the extra credit opportunity, which did not involve commuting to campus. The grade distribution for GIS

Analysis suggests that students from the whole grade spectrum participated in the bonus assignment. More specifically, 7/14 (50%) of A-students, 7/13 (54%) B-students,

1/1 D-students, and 0/1 F-students participated, showing that the motivation across top

(A) students, good (B) students, and poor (D, F) students to participate in the extra credit assignment is similar. This is somewhat different from earlier studies that showed that significantly more students who earned below the average and average elected not to participate in extra credit tasks (Padilla-Walker et al. 2005). A grade improvement due to the completed bonus assignment can be observed for seven out of the 15 participating students in this class.

The impact students had on OSM through participation in this extra credit assignment can be measured by the number of edits they made. On average, each

97

student added 104 buildings (median: 87), although the assignment asked for a minimum of 50 buildings only. This resulted in a total of 1554 buildings in OSM through students. The median and mean number of buildings edited did not vary significantly between student performance (A through F letter grades considered before extra credit), which indicates that the work performance and motivation between all students who participated is comparable, independent of their overall course performance.

Contributions to the OSM mapping platform are in general predominantly made by male users (Stephens 2013; Schmidt and Klettner 2013). As opposed to this, the extra credit assignment did not reflect the usual gender bias towards male participation.

More specifically, the GIS Analysis course had a total enrollment of 29 students (31.0% female), with 15 students participating in the extra credit assignment. Among these 15 participants, 40.0% were female, which is higher than the percent of female enrollment in the course (31.0%). However, the difference is not statistically meaningful, suggesting that, if a reward by grade is involved, male and female students are similarly motivated to participate in OSM contributions. This is in-line with previous findings from an earlier study about volunteer research participation among 193 undergraduate students, which suggests that gender does not play a role in participation rates of women and men

(Padilla-Walker et al. 2005).

Community users

The 50 most active OSM contributors between March 2015 and August 2016 in

Miami-Dade County that were contacted via direct messages submitted between 1 and

594 changesets (mean: 92, median: 27) in Miami-Dade County during this period. The number of map edits per user ranged between 518 and 62555 (mean: 4052, median:

1324). The OSM sign up date of these 50 users was extracted from the main API. The

98

histogram shows that the majority of these top 50 users are long standing OSM members who registered to the project between December 2006 and April 2015 (Figure

5-3b). Only seven out to the 50 users responded to the initial query. Four mappers provided supportive feedback, but were not able to help out due to busy schedules or unfamiliarity with the area. The three remaining users did contribute to the project, although their user names did not show in the TM. This means that their contributions lean towards quality checks and follow up fixes. In fact, these users opened several

OSM notes, provided changeset discussions and fixed several data issues in the proximity of import buildings. These contributions are also valuable parts of data imports.

13 of the “top 50” mappers are Mapbox11 employees working for the Data team, which operates worldwide on creating new data, improving existing features and fixing errors reported by OSM users. The fact that these users appear in the “top contributors in Miami” list indicates a small and generally inactive OSM user base in Miami-Dade

County.

The TM had 30 users listed, though not all of them contributed to the project through data edits. By analyzing the history dump after August 1, 2016, 34 users were identified to add original import buildings to OSM. 18 of these users also submitted changesets with the #miabuildings import hashtag and showed up in either the TM or in the list of users extracted from the history dump. After combining users that used the

#miabuildings import hashtag with those that did not, and excluding student accounts and the official import account that was used to automatically upload buildings, 32

11 https://www.mapbox.com/

99

unique users were left that were considered community users as they interacted with the import process first handed. These 32 community users are responsible for around the same number of buildings (1547) as the student group (1554). However, 9 of them

(identified through TM) did not add any import buildings to OSM, but rather ran some other edits. This shows that the initial interest in an import project (expressed by signing up for the TM with their OSM credentials) does not always result in actual contributions.

The remaining 23 contributors added 67 import buildings to the project on average, which implies a smaller import rate than for students (see Table 2-1). A two sample t- test showed that there was a significant difference in the log transformed number of imported buildings between community users (M=2.6, SD=1.9) and students (M=4.5,

SD=0.6): t(27.24)=4.41, p<0.001. These results suggest that different user engagement techniques have an effect on user activity. In this case, the higher activity of students could have been driven by their desire for a higher grade. As opposed to this, community users would not experience a short term gain (e.g. monetary or prestige) from the import task. This means that although in the short run students handled more imports per user, on the longer run it can be expected that community users provide more data than non-community users, since social mappers were previously found to contribute continuously (Hristova et al. 2013). Although that study analyzed mapping parties, we consider them the same as our categorization of community mappers as they are working towards a defined goal (import buildings) and also meet face to face at social events occasionally.

The import tasks became an organic part of OSM so that data were further edited by the community. Such edits include further refinements of building geometries and tag

100

additions (e.g. name of a hotel). A total of 177 OSM users otherwise not related to the import process interacted with import buildings so far. This is similar to OSM users interacting with the pedestrian network imported with the TIGER dataset (Zielstra,

Hochmair, and Neis 2013) or excessively editing ways after a local import (Mooney and

Corcoran 2013). Such observed follow-up edits demonstrate the additional usefulness of data imports.

New and existing users

Another aspect to look at data imports is whether they tend to engage new and existing users differently. To explore this, users were divided into existing user and new user categories. 23 new users were identified who came contact with OSM during the import task. This category consists of all 15 participating students and eight newly registered users through community outreach. All of the remaining users created their accounts at least three months before the actual import task has begun. As it was shown earlier in the “Community users” section, import activity of users is largely affected by their different motivation. This is also obvious from the first two columns of

Table 5-1, which shows that students were more active in the import task than those new users who were recruited through community events. It is also supported by a two- sample t-test on the log transformed number of buildings (t(8.45)=6.5; p<0.001). To account for this, we compare the activity of new community users to existing community users to reveal if the difference between the import activity of new and existing users is meaningful. A two-sample t-test conducted on the log transformed number of imported buildings between existing users (M=3.3, SD=2.0) and new users gained through community outreach users (M=1.4, SD=1.3) confirms that existing members tend to add significantly more buildings: t(20.01=-2.73, p=0.01). Furthermore, the effect of

101

motivation on import activity remains significant at the 95% level when compared between students and existing community members

Practically it means that data imports benefit most from existing members and highly motivated users, whereas new users who gain no clear benefit from the import tend to generate less data. Therefore, keeping in mind community growth, a healthy combination of new and existing users is the most beneficial.

Mapping Behavior

Temporal aspects

Figure 5-3 shows the histograms of OSM sign up dates which were extracted from the main API for students (Figure 5-3a), for users contacted via direct messages

(Figure 5-3b) and for community users (Figure 5-3c). Student sign up dates follow closely specific academic events during the semester, such as the introduction of the extra credit assignment in a lecture (February 15, 2017, shown with a vertical dashed line) or assignment deadlines. The due dates (solid vertical lines) for GIS Programming were December 2, 2016 and March 29, 2017 for GIS Analysis, respectively. Most of the contacted users from the “top 50-editing list” have prior mapping experience, which is reflected by the fact that the majority of these users signed up more than a year before the import project. Community users, who interacted with the import dataset first handed consist of both new and experienced mappers. 40% of this group signed up to the OSM platform after the first discussions in May 2016 and 35% of them after August 2016, when the hands on mapping started, resembling the new mapper portion. This suggests that increased social media activity and local outreach can be effective methods in gaining new contributors.

102

To explore how different users interacted with the import task over time, their activities were plotted based on interactions with import buildings (addition, edits) between August 2016 and October 2017. Figure 5-4 shows the import activity for different user groups as well as for the overall activity around imports (all groups combined). A time-series visualization allows to assess trends, seasonal and random components at work (Bégin, Devillers, and Roche 2017a). The overall activity, which is the sum of the group activities shows distinct peaks, suggests that the import did not happen at constant speeds but that instead different events triggered increased activity over shorter periods of time. More specifically, dashed vertical lines in Figure 5-4 represent community related events (meetups), while solid vertical lines show due dates of home assignments for students. These events are listed in a chronological order in

Table 5-2.

Community users (23 individuals) learned about the import project through various channels, such as meetups, message reach out, OSM Wiki pages and mailing list communications. A subgroup that consists of 15 existing members of the OSM project can be considered experienced mappers. There is an association between the early contributions of this group (Fall 2016) and the community events held at that time.

Event 1C did not trigger any significant activity, as it was a technical presentation along with general information about the proposed import process. Import related activities visible on Figure 5 before the first hands on mapping session (2C) can be accounted to organizers testing the import process and. To a few early users who followed the online conversations in the chat group. A clear increase in activity can be observed in the plot caused by hands on mapping sessions 2C and 3C. A similar event (4C), however, did

103

not have such an effect due to low participation before the holidays. It is also obvious from the plot that this user group remained relatively active even when no more community events were organized. The motivation of these users can be classified as intrinsic as they were offered no benefits or gains, yet they participated in the import and contributed to its success. This group mainly consists of locals. The continuing interest of these users in the import can be explained by the pride of place concept (Coleman,

Georgiadou, and Labonte 2009a), which describes the desire of the mapper to see one’s own home town or region (Miami-Dade County in this context) on the map. Their behavior is also similar to previous findings about loyal OSM users regularly checking and updating their “pet locations”, which is the area where they edit most frequently

(Napolitano and Mooney 2012).

Newly recruited community users show a different activity pattern. Surprisingly, no editing activity for these users was recorded until event 4C, even though three of the eight new users in this category signed up before that date. We account this for this to the fact that OSM, especially a data import may seem difficult at first and new users can be overwhelmed with information. Our community events with a many participants did not seem to provide a good platform for engaging new contributors. In contrast, event

4C was not well attended, which provided an opportunity to dedicate more time and attention to newcomers who were present. Three users with no prior OSM experience attended this event, out of which one user (a local) successfully imported several buildings (and added even more at later dates during that month). The remaining two users at this meetup were not interested in the import, but rather in general discussions about mapping. Figure 5-4 also shows that the long-time engagement of new

104

community members is not sustained. Unlike the previous group, their contributions are ad-hoc and can be traced back to social media posts or other events (e.g. HOT mapping), but then quickly vanish. This is similar to what has been revealed for mapping parties through user interviews, that is, new users are not engaged for longer periods (Napolitano and Mooney 2012).

Students in the GIS Programming (one student) and GIS Analysis (15 students) classes focused their activities around the due dates of their home assignments (1S and

2S on Figure 5-4). Even though students were introduced to this extra credit task months before the due date (October 28, 2016 for 1S and February 15, 2017 for 2S), their activity peaked right before the deadline and then quickly vanished. This suggests that students were highly active before the deadlines but otherwise spent very little time on the task. This is in line with common practice of college students postponing assigned tasks until the day or night before due dates (Burchfield and Sappington 2000;

Fernald 2004). The figure also reveals that none of the students remained active after submitting their assignments, which indicates that our experiment was not successful in attracting students to become permanent OSM contributors. Only two out of 16 students who completed the extra credit assignment have some ties to the project area in Miami-

Dade, either by working or having grown up in Miami, based on class introductions posted by students. This lack of ties to the study area for almost all students may explain the absence of motivation for students to voluntarily continue with OSM mapping activities after completing their assignment. Instead, students appear to be motivated primarily by the prospect of improved course grades, which can be classified as an extrinsic motivator (Budhathoki and Haythornthwaite 2013).

105

The activities of the remaining user group (“other”) in Figure 5-4 are not associated with either community events or student assignment deadlines. These contributions tend to follow a random pattern and could be a result of people spending their vacation in Miami and editing the map in the meantime, or just a regular OSM editor making some edits. The first distinct activity peak or other users in July 2017 can be accounted for one user (otherwise not related to the data import) adding building level information (“building:levels”) to 342 of the import buildings. The other peak in

September 2017 is related to HOT Hurricane Irma relief, which drew a large amount of editing activity to Miami. Interestingly, this humanitarian mapping project increased the activity of the community user group as well (both new and existing members) as some local members of this group are known to contribute to HOT projects and this event provided an opportunity to further map their home area.

Spatial aspects

The Tasking Manager logs user activity by storing when users accessed (i.e. locked) individual tasks. This information allows to explore which areas users prioritized for data imports and editing. Tasks were spatially subdivided into US Census block groups which contain approximately the same number of residents. Figure 5-5 provides an overview of the number of times a task was locked over time. There is no limit as to how many times a task can be locked by users. Even when a task is marked as “done”, another user can still interact with it, for example to validate edits.

The spatial distribution of activities shows a few distinct patterns. Since US

Census block groups contain approximately the same number of residents, they tend to be larger around the edge of the project area which has a low population density. These census block groups appear therefore more prominently on the map, which makes them

106

more likely to be chosen by contributors, especially by those who lack local knowledge about the spatial layout of the study area. These larger census blocks show agricultural

(Homestead), natural (Water conservation area), or industrial (e.g. quarries) characteristics and were often locked by users. Also, centrally located areas, where tasks are smaller in size, tend to be very popular among mappers, probably because of some mappers’ interest to learn more about the city center regions. Accordingly, frequently locked tasks can be found in Downtown Miami (blue circle in Figure 5-5), the financial district (Brickell) south of Downtown Miami, or Key Biscayne, which is a scenic and touristic island. From all the areas that users locked, 91 were marked as “done”.

These areas are highlighted in Figure 5-6 (green polygons) among other areas that showed only some or no activity. Approximately 50% of the areas end up being finished once a user locks them. Large number of tasks marked as done are found in natural areas, which require only a few or no building imports. Also, downtown areas showed similarly high number of finished tasks, probably due to user interest in these areas. It has to be noted, that a task is not automatically marked as “done”. Therefore, in reality the number of tasks that are already finished could be higher. There was no evidence of users erroneously marking tasks finished. Also, 41% of the total task areas (1591) contain no buildings to be imported. These tasks require no work, and therefore could easily increase the completion rate if set to “done”. However, only 27 of the 650 no building tasks have been marked as “done” so far.

Figure 5-7a shows that most tasks are locked only once (143), while only a few are locked more than 5 times. This can be expected because once an area has been correctly imported with all building conflicts removed, there is no need to work on it

107

anymore. Accordingly, most tasks (174), whether finished or unfinished, were only worked on by one mapper (Figure 5-7b). A few popular tasks show that some areas remain interesting for OSM users, even though they are marked as “done”. The most popular tasks were locked by three different users. These two distributions closely follow a power law function with an exponent value of 2.39 and an adjusted R2 of 0.94 in the case of task locks (Figure 5-7a), and an exponent of 2.95 and an adjusted R2 of

0.99 in case of number of users working on a task (Figure 5-7). Although these heavy- tailed distributions follow a similar pattern observed many times in user-user generated data, they might not be statistically expressive due to our low sample sizes, especially in

Figure 5-7b.

Other OSM activities

During the considered time frame (August 1, 2016 – October 1, 2017) OSM users are not limited to the import task. Analyzing user contributions outside Miami-

Dade County allows one to learn more about user characteristics and typical contribution behavior. Figure 5-8 shows the worldwide spatial distribution of changesets submitted by community users (blue areas) and graduate students (red triangles indicating centroids of changesets) between May 2016 and October 2017. The commitment of long standing OSM members from the OSM community is reflected in

Figure 5-8 through changesets that cover significantly larger areas than those of student changesets. This can be expected since students were introduced to OSM through this import project and had no prior OSM experience. The majority of changesets submitted by community users can be found in the US, suggesting that these users are mostly US residents. Their interest in the entire country is also reflected through mapping activities in Hawaii and Puerto Rico, which are not part of the Contiguous United States. A clear

108

divide in contribution patterns along the US-Canadian border suggests that the interest of a user group can be influenced by administrative boundaries and cultural aspects.

This can be explained by the pride of place concept (Coleman, Georgiadou, and

Labonte 2009a; Napolitano and Mooney 2012). National borders were also found to shape the spatial extent of mapping activities of individual users in other studies, for example when editing OSM based on Mapillary street level photos (Juhász and

Hochmair 2016a). The extensive mapping of Columbia in South America by community users is a result of one user who divides his or her mapping efforts between Columbia and South Florida. Besides geographic coverage another distinct pattern between students and community users is the higher mapping activity associated with humanitarian projects (HOT OSM12) of the latter group. Since May 2016, 15 community users submitted over 6,200 changesets with “#hotosm” and “#missingmaps” .

These changesets were mainly located in Haiti after Hurricane Matthew in 2016, in

Africa (as part of Missing Maps13 projects in Tanzania, Nigeria, Congo, etc.) and in Sri

Lanka. An OSM data import project is similar to a humanitarian project in the sense that users work towards a specific goal while following centralized instructions. This explains that some community users contributed both to HOT and data import activities, in some instances within the project area (see last paragraph of Section 3.2.1).

The spatial distribution of student OSM activities is spatially concentrated in

Miami-Dade County, with only one student contributing outside. This student added several buildings in Bangladesh from aerial imagery and added also building names to

12 https://www.hotosm.org/

13 http://www.missingmaps.org/

109

existing features in that region. According to the class introduction, this student is originally from Bangladesh and most probably has personal knowledge and ties to the mapped area.

Summary

This study analyzed if and how a local building import task can help to engage students and targeted community groups in OSM participation and retention. One of our observations related to the organizational aspects of such data imports is that the amount of information provided to participants can overwhelm prospective, new users who are otherwise unfamiliar with collaborative mapping. This can prevent these individual from participating in the proposed import, or even getting started with OSM. In our case, the first three community events were not successful in gaining any new contributors due to the lack of attention given to newcomers. As opposed to this, another community event, which due to the low number of participants provided a platform to create a more personalized experience for newcomers. This eventually resulted in a retained new community member. We also found that checking the early edits of new contributors and providing feedback are effective methods for ensuring quality contributions later on.

Our results show that the engagement technique used to recruit users has a significant effect on import activity, that is students who were recruited as part of an extra credit assignment imported more buildings on average than users who were recruited on community meetings or through social media activity. The outreach technique is ultimately related to the different motivation of users. In that regard, our results suggest that extrinsic motivation (i.e. students receiving extra credit) triggers more activity than the intrinsic motivation of community users.

110

However, our experiments proved to be unsuccessful in retaining new users on the long run, regardless of their motivation. 23 new OSM members started mapping due to the import project. The activity of students (15 individuals) corresponds to academic deadlines, but no continuation of long term activities could be observed. As previous research already showed (Hristova et al. 2013), mapping parties fail to retain new- comers almost completely, with no retention at all in the long term. This study expanded related research to the academic environment and found the same problem with extra credit activities among GIS students. Although participating students did contribute more data than required for the assignment, the import exercise did not keep the students as permanent OSM contributors, since after submission deadlines no more data imports from students could be observed. The remaining eight new users were recruited in community events, or through online outreach. The activity pattern of these users also correspond to specific events, such as community events, social media posts or other related mapping activities in the study area. However, similarly to students, none of these new users became long-time contributors. However, our experiments did not include follow up surveys or conversations with new users, which could be an option for future import projects.

Though the study suggests that no long term contributions can be expected from newly recruited mappers, it also shows the dedication of existing community members who continued their editing activity throughout the study time frame. In addition, imports reach beyond those from the direct import since these data trigger also long term editing activities (e.g. adding attributes, fixing geometry errors) submitted by other, already engaged OSM user contributors. The study showed also that already established

111

mappers do not change their contribution behavior through community events. Instead, they are active before and after the event, contributing to OSM on a regular basis. This is also in line with previous findings from OSM mapping parties (Hristova et al. 2013).

The study also provides evidence of other users otherwise unrelated to the project interacting with the import buildings. Two distinct instances of this activity was one individual adding more information to buildings (i.e. building levels) and increased editing activity due to an organized hurricane relief event by HOT.

To keep an active user base in OSM that will also in the future ensure regular data updates and quality enhancement, new ways of user recruitment and retention are necessary. While the presented study showed that extra credit assignment increase short-term engagement of students, highlighting fun aspects could be another potential component to retain new mappers in OSM on the longer term, as this was also found to be a major driver for other collaborative projects, such as Wikipedia (Nov 2007). For example, geo-gaming and gamification has been shown to be an attractive and incentivizing way of engaging a different audience in land cover validation (See et al.

2015) and the collection of crowd-sourced Points of Interest (Juhász and Hochmair

2017b). While there is no best recipe on how to integrate fun components into OSM mapping and import tasks, considering ideas from other platforms could provide some guidance and ideas, including the provision of reward diversity (Choi et al. 2014), or the interaction between participants, e.g. by responses to video recordings (Spiro 2012).

Future work will aim to integrate such components into mapping events and recruitment efforts and evaluate their efficacy. Based on experiences gained through this experiment, we would suggest that similar projects put extra effort in interacting with

112

prospective users who has no prior OSM experience. Providing a welcoming, personalized experience that address the special need of these users might be an effective way in engaging more new users. Involving students for extra credit seems to be an effective way in increasing data import activities, therefore it could be applied for future projects lacking enough number of contributors. However, the success of this method is questionable if the aim is to specifically gain long term OSM contributors due to the nature of their motivations.

113

(b)

(a) (c)

Figure 5-1. Import buildings (automatically uploaded – green, for manual review – red) in Miami-Dade County (a), excerpt from the import tutorial (b), and user interface of the Tasking Manager instance (c)

(a) (b)

Figure 5-2. Spatial distribution of OSM changesets (transparent cyan rectangles) between March 2016 and August 2016 in Miami-Dade County (red outline) in South Florida (a), and a Maptime Miami Meetup held on September 26, 2016 (b)

114

Figure 5-3. Histograms of sign up dates for different user groups: students (a), top mappers (b), and community mappers (c). For students, assignment due dates (solid vertical lines) and first introduction to the project (dashed vertical line) are shown

Figure 5-4. Import activity over time for different user groups. Dashed vertical lines show community events, while solid vertical lines represent assignment deadlines for students

115

Figure 5-5. Number of times each individual tasks from the Tasking Manager were accessed revealing that downtown areas and larger tasks are being accessed more frequently

116

Figure 5-6. Finished tasks (green), tasks that have been worked on (red), and tasks that have not been worked on (blue)

(a) (b)

Figure 5-7. Fitted power law functions on task lock (a) and user (b) distributions (log-log plots)

117

Figure 5-8. Spatial distribution of OSM changesets submitted by participating users between May 2016 and October 2017

118

Table 5-1. Descriptive statistics of imported buildings by user group Students Community users (new) (new) (existing) Overall N 15 8 15 23 Total # of buildings 1554 69 1478 1547 Average 103.6 8.7 98.5 67.2 Median 87.0 3.5 24.0 16.0 SD 59.3 11.8 159.9 135.0

Table 5-2. Description of events related to the import project Event Event type Event description Date 1C Community Technical discussion of software tools, August 1, 2016 general information on the import, intro messages sent out 2C Community Presentation of automatic import results, September 26, hands-on mapping session 2016 3C Community Hands on mapping session dedicated to the November 17, import 2016 1S Student GIS Programming course bonus assignment December 2, due 2016 4C Community Hands on mapping session dedicated to the December 15, import 2016 2S Student GIS Analysis course bonus assignment due March 29, 2017

119

CHAPTER 6 CONCLUSIONS

The data quality of Volunteered Geographic Information ultimately depends on the people who engage in the creation of VGI. For that reason, understanding user behavior is crucial if we are to provide quality measures. This dissertation extends the literature by providing user centric analyses of Volunteered Geographic Information.

Various objectives were addressed in order to reach this overall goal.

The first objective of this dissertation was to analyze the spatio-temporal user contribution patterns of a newly emerged VGI platform during its early stages. As a case study, Mapillary was used that aims to cover the world with crowdsourced street level images. The study illustrated through Mapillary contributions how a new community develops over time. It was also the first study introducing Mapillary data to the

GIScience community. Results show that Mapillary experienced high data growth rates in its early stages. Several measures were used to describe the contribution patterns, such as days of active mapping, mapped kilometers per week, radius of gyration. The distributions of these measures follow power law functions indicating contribution inequality. Overall, high growth rates of Mapillary indicate that new VGI platforms can be effective in gaining new contributors. This chapter also explored different types of contributors and identified so called “map lovers” and “casual mappers”. The difference between these two types lies in the nature of their motivations which results in different types of contributions. Map lovers tend to put extra effort in mapping (e.g. planning sessions in advance and systematically recording a neighborhood), while casual mappers tend to contribute only when it is related to their activity and when it does not require extra effort (e.g. recording sections of a hiking trail). These types can also be

120

used to illustrate participation inequality mentioned in Chapters 1 and 2. In VGI platforms, map lovers represent the heavy tail of power law distributions as they are small in numbers but generate large amounts of accurate data. The 90-9-1 rule refers to them as the 1%. The group of casual mappers consists of more members than map lovers and can be seen as the 9% constructing the middle portions of power law distributions. The rest of the community gives the largest group of the whole user base.

These users can be considered as lurkers who contribute very little to no data.

A related research objective was to assess the completeness of Mapillary, and to illustrate the contributions that a new community can achieve in a relatively short amount of time. For this aim, a method was developed that is able to compute completeness values of street level images on different road categories. Furthermore, the method presented in Chapter 2 also compared Mapillary image coverage to that of

Google Street View, a commercial provider of street level imagery. Results show that on a global scale, Mapillary users focused on Europe at first. More specifically, when looking at image availability on main roads, 16 of the most complete administrative units

(i.e. countries) were in Europe, including the Netherlands, Germany, and Sweden.

However, the effectiveness of a small but dedicated user community is illustrated by the fact that the most completely mapped administrative unites were Barbados, Hong Kong and Nicaragua respectively, all surveyed by less than 10 contributors. High completeness values in these cases can also be attributed to the relatively short road system in these administrative units. The study also compared Mapillary image coverage to Google Street View for selected areas on three road categories (main roads, residential roads and off-road bicycle and pedestrian roads). Results show that

121

overall, Street View delivers better coverage than Mapillary with one exception for

Malmö on off-road segments. However, since imagery is not evenly distributed throughout the study sites, local areas exist where a new VGI platform can outperform existing, commercial products. These areas could indicate either a heightened interest from users in mapping a specific area, or a gap in Street View coverage.

Data interplay can happen between platforms, meaning that information from a

VGI project can be used to improve another, if certain conditions (e.g. data license compatibility, thematic overlap) applies. Since the description and analysis of this phenomenon was missing from the literatue, the next objective aimed to provide a first description of how data is being used across VGI platforms. More specifically, the goal of the next objective was to determine to what extent Mapillary imagery is used as a source for OSM mapping by the community. Two case studies were presented in

Chapters 3 and 4 respectively. The first relied on information indicated by the OSM community describing what data source they used to edit the map. In other words, the first study utilized tags to identify OSM map edits that are based on Mapillary. The study found that the geographic focus of Mapillary based OSM edits corresponds to areas with high Mapillary activity. It also showed the tag distribution of features derived from

Mapillary is significantly different from that of the entire OSM dataset. The percentage of cross-linked features compared to the entire OSM dataset is higher for transportation

(highway, public transport, traffic sign) and leisure (natural, amenity, tourism). This finding is in line with common activities associated with Mapillary, which are recording photos while commuting, traveling, and outdoor and leisure activities, such as hiking.

The crowdsourced nature of Mapillary allows users to map OSM features in places

122

where they are currently less frequently found, including airport taxiways or indoor objects. Cross-linking the two data sources can also help to improve data quality. An example was given where a changed road network pattern was reflected in Mapillary photographs, which were then used to update OSM road geometries. Furthermore,

Mapillary images provide a potential data source for adding OSM feature attribute information (e.g. surface type, name of business) without the need to conduct a field survey. The study also explored how the same individual uses these two VGI platforms together. Results show that a mapper is more likely to edit larger areas in OSM than what is covered with his or her Mapillary photos, which can be attributed to the fact that

OSM mapping can be a remote activity, while physical presence at a location is required to take Mapillary photos.

Relying on user created tags to identify OSM edits based on Mapillary underestimates the real extent of this activity, therefore, in another case study the areas in which OSM contributors loaded Mapillary were spatio-temporally cross-checked with editing events. This information resulted in a massive database that occupied 1.7TB of hard disk space, which required a two-tiered approach to effectively extract OSM edits that are likely based on Mapillary. Results show that users often do not include source information along with their edits. The study also identified that the OSM community is more likely to use the JOSM editor to derive information from street level photographs.

The last objective of the dissertation was to study if and how different user groups change their behavior during an OSM data import. This objective indirectly translates to identifying what community outreach techniques are the most effective in triggering community growth, and therefor is relevant to emerging VGI platforms aiming

123

to grow their user base. To address this, a large scale OpenStreetMap building import task was designed and implemented in Miami-Dade County, Florida, and different user groups were asked to join the imort project. These groups were undergraduate and graduate students, existing OSM mappers previously active in the area, and newly recruited mappers through community events. Results show that the type of engagement technique used to recruit users has a significant effect on the import activity. Students who were recruited as part of an extra credit assignment imported more buildings on average than users who were recruited at community meetings or through social media activity. The level of contribution activity is ultimately related to different motivations of users. In that regard, results suggest that extrinsic motivation

(i.e., students receiving extra credit) triggers more activity than the intrinsic motivation of community users, at least in the short run. However, due to the relatively small sample size of users, the results of statistical tests presented in this chapter should be interpreted carefully. This last experiment proved to be unsuccessful in retaining new users in the long run, regardless of their motivation.

This dissertation analyzed data from VGI platforms with the explicit purpose of generating geodata. However, several methods and observations can be applied for a wider range of data sources, such as to involuntarily generated geographic information.

The popularity of geo-gaming and geo-social media also results in a rapid growth of available sources and applications. Therefore, it is of interest to describe their characteristics, which could reveal what groups of people they engage, and what kind of information may be extracted from them. Similarly to the introduction of Mapillary as a novel data source in Chapter 2, other VGI sources that are loosely related to this

124

dissertation work have also been described and analyzed. These include Pokémon Go, an augmented reality smartphone game that gained extreme popularity in 2016 (Juhász and Hochmair 2017b), and , an instant messaging and social media application popular among young people (Juhász and Hochmair 2018a). Since the number of available VGI application and data sources is on the rise, for future work potentially many more novel platforms could be analyzed.

Traditionally, VGI research focused on single platforms. This dissertation extends this concept by analyzing the interplay of two VGI platforms, Mapillary and

OpenStreetMap in Chapters 3 and 4. In fact, this contributor behavior may be widespread among other data sources as well. For example, it was already observed that utilized the locations of Pokémon Go points with a crowdsourced approach, and included this information in their search system. As a result, Yelp users were able to combine two distinct activities, eating out at a restaurant and playing Pokémon Go

(Juhász and Hochmair 2017b).

A possible direction for future work could be analyzing the bahavior of individual contributors to many VGI platforms, including mapping and (geo-)social media activities.

Advancing our understanding of this phenomenon could bring potential benefits, including better human mobility models utilizing information that is currently unknown from a single platform, or better travel recommendation systems for example. As a first step in direction, several methods and techniques were reviewed to quantify the spatial similarity of activities of individuals in multiple platforms (Juhász and Hochmair 2018b).

125

LIST OF REFERENCES

Antoniou, V., Morley, J., & Haklay, M. 2010. "Web 2.0 geotagged photos: Assessing the spatial dimension of the phenomenon." Geomatica 64 (1):99-110.

Backstrom, L., Sun, E., & Marlow, C. 2010. "Find Me If You Can: Improving Geographical Prediction with Social and Spatial Proximity". 19th International Conference on World Wide Web, Raleigh, NC, USA, Apr 26-30. doi: 10.1145/1772690.1772698.

Barron, C., Neis, P., & Zipf, A. 2014. "A Comprehensive Framework for Intrinsic OpenStreetMap Quality Analysis." Transactions in GIS 18 (6):877-895. doi: 10.1111/tgis.12073.

Beaulieu, A., Bégin, D., & Genest, D. 2010. "Community mapping and government mapping: Potential collaboration?". Proceedings of the Symposium of ISPRS Commission I, Calgary, AB, Canada.

Bégin, D., Devillers, R., & Roche, S. 2013. "Assessing volunteered geographic information (VGI) quality based on contributors’ mapping behaviours". Paper presented at the Proceedings of the 8th international symposium on spatial data quality ISSDQ, Hong Kong. doi: 10.5194/isprsarchives-XL-2-W1-149-2013.

Bégin, D., Devillers, R., & Roche, S. 2017a. "Contributors’ enrollment in collaborative online communities: the case of OpenStreetMap." Geo-spatial Information Science 20 (3):282-295. doi: 10.1080/10095020.2017.1370177.

Bégin, D., Devillers, R., & Roche, S. 2017b. "Contributors’ Withdrawal from Online Collaborative Communities: The Case of OpenStreetMap." ISPRS International Journal of Geo-Information 6 (11):340. doi: 10.3390/ijgi6110340.

Brovelli, M., Minghini, M., Molinari, M., & Zamboni, G. 2016. "Positional Accuracy Assessment of the Openstreetmap Buildings Layer Through Automatic Homologous Pairs Detection: the Method and a Case Study." The International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences 41:615. doi: 10.5194/isprs-archives-XLI-B2-615-2016.

Bryant, S. L., Forte, A., & Bruckman, A. 2005. "Becoming Wikipedian: Transformation of Participation in a Collaborative Online Encyclopedia". Proceedings of GROUP: International Conference on Supporting Group Work, Sanibel Island, FL. doi: 10.1145/1099203.1099205.

Budhathoki, N. R., & Haythornthwaite, C. 2013. "Motivation for Open Collaboration: Crowd and Community Models and the Case of OpenStreetMap." American Behavioral Scientist 57 (5):548-575. doi: 10.1177/0002764212469364.

126

Burchfield, C. M., & Sappington, J. 2000. "Compliance with Required Reading Assignments." Teaching of Psychology 27 (1):59-60.

Cheng, Z., Caverlee, J., Lee, K., & Sui, D. Z. 2011. "Exploring Millions of Footprints in Location Sharing Services". Fifth International AAAI Conference on Weblogs and Social Media, Barcelona, Spain, Jul 17-21.

Choi, J., Choi, H., So, W., Lee, J., & You, J. 2014. "A Study about Designing Reward for Gamified Crowdsourcing System." In Design, User Experience, and Usability. User Experience Design for Diverse Interaction Platforms and Environments. DUXU 2014 (Lecture Notes in Computer Science, Vol. 8518), edited by A Marcus, 678-687. Cham: Springer. doi: 10.1007/978-3-319-07626-3_64.

Cipeluch, B., Jacob, R., Winstanley, A., & Mooney, P. 2010. "Comparison of the accuracy of OpenStreetMap for Ireland with Google Maps and Bing Maps". Ninth International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences, Leicester, UK.

Clarke, P., Ailshire, J., Melendez, R., Bader, M., & Morenoff, J. 2010. "Using Google Earth to conduct a neighborhood audit: Reliability of a virtual audit instrument." Health & Place 16 (6):1224-1229. doi: 10.1016/j.healthplace.2010.08.007.

Coleman, D. J., Georgiadou, Y., & Labonte, J. 2009a. "Volunteered Geographic Information: The Nature and Motivation of Produsers." International Journal of Spatial Data Infrastructures Research (IJSDIR) 4 (1):332-358. doi: 10.2902/1725- 0463.2009.04.art16.

Coleman, D. J., Georgiadou, Y., & Labonte, J. 2009b. "Volunteered geographic information: The nature and motivation of produsers." International Journal of Spatial Data Infrastructures Research 4 (1):332-358.

Davidovic, N., Mooney, P., Stoimenov, L., & Minghini, M. 2016. "Tagging in Volunteered Geographic Information: An Analysis of Tagging Practices for Cities and Urban Regions in OpenStreetMap." ISPRS International Journal of Geo-Information 5 (12):232. doi: 10.3390/ijgi5120232.

Davis Jr, C. A., Pappa, G. L., de Oliveira, D. R. R., & de L Arcanjo, F. 2011. "Inferring the Location of Twitter Messages Based on User Relationships." Transactions in GIS 15 (6):735-751. doi: 10.1111/j.1467-9671.2011.01297.x

Degrossi, L. C., Porto de Albuquerque, J., Santos Rocha, R. d., & Zipf, A. 2018. "A taxonomy of quality assessment methods for volunteered and crowdsourced geographic information." Transactions in GIS 22 (2):542-560. doi: 10.1111/tgis.12329.

127

Elicker, J. D., McConnell, N. L., & Hall, R. J. 2010. "Research Participation for Course Credit in Introduction to Psychology: Why Don't People Participate?" Teaching of Psychology 37 (3):183-185. doi: 10.1080/00986283.2010.488521.

Elwood, S., Goodchild, M. F., & Sui, D. Z. 2012. "Researching Volunteered Geographic Information: Spatial Data, Geographic Research, and New Social Practice." Annals of the association of American geographers 102 (3):571-590. doi: 10.1080/00045608.2011.595657.

Fan, H., Zipf, A., Fu, Q., & Neis, P. 2014. "Quality assessment for building footprints data on OpenStreetMap." International Journal of Geographical Information Science 28 (4):700-719.

Fernald, P. S. 2004. "The Monte Carlo quiz: Encouraging punctual completion and deep processing of assigned readings." College Teaching 52 (3):95-99.

Flanagin, A. J., & Metzger, M. J. 2008. "The credibility of volunteered geographic information." GeoJournal 72 (3-4):137-148. doi: 10.1007/s10708-008-9188-y.

Fram, C., Chistopoulou, K., & Ellul, C. 2015. "Assessing the quality of OpenStreetMap building data and searching for a proxy variable to estimate OSM building data completeness". Proceedings of the 23rd GIS Research UK (GISRUK) Conference.

Fritz, S., See, L., & Brovelli, M. 2017. "Motivating and Sustaining Participation in VGI." In Mapping and the Citizen Sensor, edited by Giles Foody, Linda See, Steffen Fritz, Peter Mooney, A-M Olteanu-Raimond, Cidália Costa Fonte and Vyron Antoniou, 93-117. London: Ubiquity Press. doi: 10.5334/bbf.e.

Girres, J. F., & Touya, G. 2010. "Quality assessment of the French OpenStreetMap dataset." Transactions in GIS 14 (4):435-459. doi: 10.1111/j.1467- 9671.2010.01203.x.

Gonzalez, A., Bergasa, L. M., & Yebes, J. J. 2014. "Text detection and recognition on traffic panels from street-level imagery using visual appearance." IEEE Transactions on Intelligent Transportation Systems 15 (1):228-238. doi: 10.1109/TITS.2013.2277662.

González, M. C., Hidalgo, C. A., & Barabási, A.-L. 2008. "Understanding individual human mobility patterns." Nature 453 (7196):779-782. doi: 10.1038/nature06958.

Goodchild, M. F. 2007a. "Citizens as sensors: the world of volunteered geography." GeoJournal 69 (4):211-221. doi: 10.1007/s10708-007-9111-y.

128

Goodchild, M. F. 2007b. "Citizens as Voluntary Sensors: Spatial Data Infrastructure in the World of Web 2.0 (Editorial)." International Journal of Spatial Data Infrastructures Research (IJSDIR) 2:24-32.

Goodchild, M. F., & Li, L. 2012. "Assuring the quality of volunteered geographic information." Spatial Statistics 1:110-120. doi: 10.1016/j.spasta.2012.03.002.

Graham, M., & Zook, M. 2013. "Augmented Realities and Uneven Geographies: Exploring the Geolinguistic Contours of the Web." Environment and Planning A 45 (1):77-99. doi: 10.1068/a44674.

Griew, P., Hillsdon, M., Foster, C., Coombes, E., Jones, A., & Wilkinson, P. 2013. "Developing and testing a street audit tool using Google Street View to measure environmental supportiveness for physical activity." International Journal of Behavioral Nutrition and Physical Activity 10:103. doi: 10.1186/1479-5868-10- 103.

Guy, R., & Truong, K. 2012. "CrossingGuard: exploring information content in navigation aids for visually impaired pedestrians". SIGCHI Conference on Human Factors in Computing Systems, Austin, TX, USA, May 5-10, 2012. doi: 10.1145/2207676.2207733.

Haklay, M. 2010. "How good is Volunteered Geographical Information? A comparative study of OpenStreetMap and Ordnance Survey datasets." Environment and Planning B: Planning and Design 37 (4):682-703.

Haklay, M. 2013. "Citizen Science and Volunteered Geographic Information: Overview and Typology of Participation." In Crowdsourcing geographic knowledge, edited by Daniel Sui, Sarah Elwood and Michael Goodchild, 105-122. Berlin: Springer. doi: 10.1007/978-94-007-4587-2_7.

Haklay, M., Basiouka, S., Antoniou, V., & Ather, A. 2010. "How many volunteers does it take to map an area well? The validity of Linus’ law to volunteered geographic information." The Cartographic Journal 47 (4):315-322. doi: 10.1179/000870410X12911304958827.

Haklay, M., & Weber, P. 2008. "OpenStreetMap: User-Generated Street Maps." IEEE Pervasive Computing 7 (4):12-18. doi: 10.1109/MPRV.2008.80.

Hara, K., Sun, J., Moore, R., Jacobs, D., & Froehlich, J. 2014. "Tohme: Detecting Curb Ramps in Google Street View Using Crowdsourcing, Computer Vision, and Machine Learning". The 27th annual ACM symposium on User interface software and technology, Honolulu, HI, USA, Oct 5-8, 2014. doi: 10.1145/2642918.2647403.

129

Hawelka, B., Sitko, I., Beinat, E., Sobolevsky, S., Kazakopoulos, P., & Ratti, C. 2014. "Geo-located Twitter as proxy for global mobility patterns." Cartography and Geographic Information Science 41 (3):260-271. doi: 10.1080/15230406.2014.890072.

Hecht, B. J., & Gergle, D. 2010. "On the "Localness" of User-Generated Content". Paper presented at the 2010 ACM Conference on Computer Supported Cooperative Work, Savannah, GA, USA, Feb 6-10. doi: 10.1145/1718918.1718962.

Hecht, B. J., & Stephens, M. 2014. "A Tale of Cities: Urban Biases in Volunteered Geographic Information". Paper presented at the Eighth International AAAI Conference on Weblogs and Social Media, Ann Arbor, Michigan, 1 - 4 June

Hecht, R., Kunze, C., & Hahmann, S. 2013. "Measuring completeness of building footprints in OpenStreetMap over space and time." ISPRS International Journal of Geo-Information 2 (4):1066-1091. doi: 10.3390/ijgi2041066.

Heipke, C. 2010. "Crowdsourcing geospatial data." ISPRS Journal of Photogrammetry and Remote Sensing 65 (6):550-557. doi: 10.1016/j.isprsjprs.2010.06.005. Hochmair, H. H., & Zielstra, D. 2015. "Analysing User Contribution Patterns of Drone Pictures to the dronestagram Photo Sharing Portal." Journal of Spatial Science 60 (1):79-98. doi: 10.1080/14498596.2015.969340.

Hochmair, H. H., Zielstra, D., & Neis, P. 2015. "Assessing the Completeness of Bicycle Trail and Lane Features in OpenStreetMap for the United States." Transactions in GIS 19 (1):63-81. doi: 10.1111/tgis.12081.

Hollenstein, L., & Purves, R. 2010. "Exploring place through user-generated content: Using Flickr tags to describe city cores." Journal of Spatial Information Science 1:21-48. doi: 10.5311/JOSIS.2010.1.3.

Hristova, D., Quattrone, G., Mashhadi, A. J., & Capra, L. 2013. "The Life of the Party: Impact of Social Mapping in OpenStreetMap". Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media, Cambridge, MA, USA.

Javanmardi, S., Ganjisaffar, Y., Lopes, C., & Baldi, P. 2009. "User Contribution and Trust in Wikipedia". 5th International Conference on Collaborative Computing: Networking, Applications and Worksharing, Washington, DC, USA. doi: 10.4108/ICST.COLLABORATECOM2009.8376.

Juhász, L., & Hochmair, H. H. 2015. "Exploratory Completeness Analysis of Mapillary for Selected Cities in Germany and Austria." GI_Forum 2015:535-545. doi: 10.1553/giscience2015s535.

130

Juhász, L., & Hochmair, H. H. 2016a. "Cross-Linkage between Mapillary Street Level Photos and OSM Edits." In Geospatial Data in a Changing World: Selected papers of the 19th AGILE Conference on Geographic Information Science (Lecture Notes in Geoinformation and Cartography), edited by Tapani Sarjakoski, Maribel Yasmina Santos and L Tina Sarjakoski, 141-156. Berlin: Springer. doi: 10.1007/978-3-319-33783-8_9.

Juhász, L., & Hochmair, H. H. 2016b. "User Contribution Patterns and Completeness Evaluation of Mapillary, a Crowdsourced Street Level Photo Service." Transactions in GIS 20 (6):925-947. doi: 10.1111/tgis.12190

Juhász, L., & Hochmair, H. H. 2017a. "How do volunteer mappers use crowdsourced Mapillary street level images to enrich OpenStreetMap?". The 20th AGILE conference on geo-information science, Wageningen, The Netherlands.

Juhász, L., & Hochmair, H. H. 2017b. "Where to catch ‘em all? – a geographic analysis of Pokémon Go locations." Geo-spatial Information Science 20 (3):241-251. doi: 10.1080/10095020.2017.1368200.

Juhász, L., & Hochmair, H. H. 2018a. "Analyzing the spatial and temporal dynamics of Snapchat". VGI-Alive pre-conference workshop at AGILE 2018, Lund, Sweden, 12 June.

Juhász, L., & Hochmair, H. H. 2018b. "Cross-checking user activities in multiple geo- social media networks". 21st AGILE Conference on Geo-information Science, Lund, Sweden, 12-15 June.

Juhász, L., & Hochmair, H. H. 2018c. "OSM Data Import as an Outreach Tool to Trigger Community Growth? A Case Study in Miami." ISPRS International Journal of Geo-Information 7 (3):113. doi: 10.3390/ijgi7030113.

Li, L., Goodchild, M. F., & Xu, B. 2013. "Spatial, temporal, and socioeconomic patterns in the use of Twitter and Flickr." Cartography and Geographic Information Science 40 (2):61-77.

Liu, J., & Ram, S. 2011. "Who does what: Collaboration patterns in the wikipedia and their impact on article quality." ACM Transactions on Management Information Systems 2 (2):11. doi: 10.1145/1985347.1985352.

Mashhadi, A., Quattrone, G., & Capra, L. 2015. "The Impact of Society on Volunteered Geographic Information: The Case of OpenStreetMap." In OpenStreetMap in GIScience (Lecture Notes in Geoinformation and Cartography), edited by Jamal Jokar Arsanjani, A Zipf, P. Mooney and M. Helbich, 125-141. Berlin: Springer. doi: 10.1007/978-3-319-14280-7_7.

131

Masó, J., Pomakis, K., & Julià, N. 2010. "OpenGIS Web Map Tile Service Implementation Standard v1.0.0" Open Geospatial Consortium. 07-057r7.

Mooney, P., & Corcoran, P. 2012a. "The Annotation Process in OpenStreetMap." Transactions in GIS 16 (4):561-579. doi: 10.1111/j.1467-9671.2012.01306.x.

Mooney, P., & Corcoran, P. 2012b. "How social is OpenStreetMap". Paper presented at the Proceedings of the 15th association of geographic information laboratories for europe international conference on geographic information science, Avignon, France.

Mooney, P., & Corcoran, P. 2013. "Understanding the Roles of Communities in Volunteered Geographic Information Projects." In Progress in Location-Based Services (Lecture Notes in Geoinformation and Cartography), edited by J Krisp, 357-371. Berlin, Heidelberg: Springer. doi: 10.1007/978-3-642-34203-5_20.

Mooney, P., & Corcoran, P. 2014. "Analysis of Interaction and Co-editing Patterns amongst OpenStreetMap Contributors." Transactions in GIS 18 (5):633-659. doi: 10.1111/tgis.12051.

Mooney, P., & Minghini, M. 2017. "A Review of OpenStreetMap Data." In Mapping and the Citizen Sensor, edited by Giles Foody, Linda See, Steffen Fritz, Peter Mooney, A-M Olteanu-Raimond, Cidália Costa Fonte and Vyron Antoniou, 37-59. London: Ubiquity Press. doi: 10.5334/bbf.c.

Mooney, P., Minghini, M., & Stanley-Jones, F. 2015. "Observations on an OpenStreetMap mapping party organised as a social event during an open source GIS conference." International Journal of Spatial Data Infrastructures Research (IJSDIR) 10:138-150. doi: 10.2902/1725-0463.2015.10.art7.

Mooney, P., Rehrl, K., & Hochmair, H. H. 2013. "Action and interaction in volunteered geographic information: a workshop review." Journal of location based services 7 (4):291-311. doi: 10.1080/17489725.2013.859310.

Napolitano, M., & Mooney, P. 2012. "MVP OSM: A tool to identify areas of high quality contributor activity in OpenStreetMap." The Bulletin of the Society of Cartographers 45 (1):10-18.

Neis, P., & Zielstra, D. 2014. "Recent Developments and Future Trends in Volunteered Geographic Information Research: The Case of OpenStreetMap." Future Internet 6 (1):76-106. doi: 10.3390/fi6010076.

Neis, P., Zielstra, D., & Zipf, A. 2013. "Comparison of Volunteered Geographic Information Data Contributions and Community Development for Selected World Regions." Future Internet 5 (2):282-300. doi: 10.3390/fi5020282.

132

Neis, P., & Zipf, A. 2012. "Analyzing the Contributor Activity of a Volunteered Geographic Information Project — The Case of OpenStreetMap." ISPRS International Journal of Geo-Information 1 (2):146-165. doi: 10.3390/ijgi1020146.

Nielsen, J. 2006. "The 90-9-1 Rule for Participation Inequality in Social Media and Online Communities." http://www.nngroup.com/articles/participation-inequality/.

Nov, O. 2007. "What motivates Wikipedians?" Communications of the ACM 50 (11):60- 64. doi: 10.1145/1297797.1297798.

"OSM Wiki: Import Catalog." 2017. http://wiki.openstreetmap.org/wiki/Import/Catalogue.

Padilla-Walker, L. M., Thompson, R. A., Zamboanga, B. L., & Schmersal, L. A. 2005. "Extra credit as incentive for voluntary research participation." Teaching of Psychology 32 (3):150-153.

Peters, I., & Stock, W. G. 2010. "“Power tags” in information retrieval." Library Hi Tech 28 (1):81-93. doi: 10.1108/07378831011026706.

Rehrl, K., Gröchenig, S., Hochmair, H. H., Leitinger, S., Steinmann, R., & Wagner, A. 2013. "A conceptual model for analyzing contribution patterns in the context of VGI." In Progress in Location-Based Services, edited by J Krisp, 373-388. Berlin, Heidelberg: Springer. doi: 10.1007/978-3-642-34203-5_21.

Restivo, M., & van de Rijt, A. 2014. "No praise without effort: experimental evidence on how rewards affect Wikipedia's contributor community." Information, Communication & Society 17 (4):451-462. doi: 10.1080/1369118X.2014.888459.

Rice, M. T., Paez, F. I., Mulhollen, A. P., Shore, B. M., & Caldwell, D. R. 2012. "Crowdsourced Geospatial Data: A report on the emerging phenomena of crowdsourced and user-generated geospatial data." Defense Technical Information Center. Report no.: ADA576607.

Rundle, A. G., Bader, M. D. M., Richards, C. A., Neckerman, K. M., & Teitler, J. O. 2011. "Using Google Street View to Audit Neighborhood Environments." American Journal of Preventive Medicine 40 (1):94-100. doi: 10.1016/j.amepre.2010.09.034.

Schmidt, M., & Klettner, S. 2013. "Gender and experience-related motivators for contributing to openstreetmap". Action and Interaction in Volunteered Geographic Information (ACTIVITY) Workshop at AGILE 2013, Leuven, Belgium.

133

See, L., Fritz, S., Perger, C., Schill, C., McCallum, I., Schepaschenko, D., Duerauer, M., Sturn, T., Karner, M., & Kraxner, F. 2015. "Harnessing the power of volunteers, the internet and Google Earth to collect and validate global spatial information using Geo-Wiki." Technological Forecasting and Social Change 98:324-335. doi: 10.1016/j.techfore.2015.03.002.

See, L., Mooney, P., Foody, G., Bastin, L., Comber, A., Estima, J., Fritz, S., Kerle, N., Jiang, B., & Laakso, M. 2016. "Crowdsourcing, citizen science or volunteered geographic information? The current state of crowdsourced geographic information." ISPRS International Journal of Geo-Information 5 (5):55. doi: 10.3390/ijgi5050055.

Soden, R., & Palen, L. 2014. "From crowdsourced mapping to community mapping: The post-earthquake work of OpenStreetMap Haiti". COOP 2014-Proceedings of the 11th International Conference on the Design of Cooperative Systems, 27-30 May 2014, Nice (France).

Spiro, I. 2012. "Motion chain: a webcam game for crowdsourcing gesture collection". CHI'12 Extended Abstracts on Human Factors in Computing Systems, Austin, TX, USA.

Stephens, M. 2013. "Gender and the GeoWeb: divisions in the production of user- generated cartographic information." GeoJournal 78 (6):981-996. doi: 10.1007/s10708-013-9492-z.

Stephens, M., & Poorthuis, A. 2015. "Follow thy neighbor: Connecting the social and the spatial networks on Twitter." Computers, Environment and Urban Systems 53 (September 2015):87-95. doi: 10.1016/j.compenvurbsys.2014.07.002.

Takhteyev, Y., Gruzd, A., & Wellman, B. 2012. "Geography of Twitter networks." Social networks 34 (1):73-81. doi: 10.1016/j.socnet.2011.05.006.

Törnros, T., Dorn, H., Hahmann, S., & Zipf, A. 2015. "Uncertainties of completeness measures in OpenStreetMap – A case study for buildings in a medium-sized German city". ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. II-3/W5, ISPRS Geospatial Week 2015, La Grande Motte, France. doi: 10.5194/isprsannals-II-3-W5-353-2015.

Touya, G., & Brando-Escobar, C. 2013. "Detecting level-of-detail inconsistencies in volunteered geographic information data sets." Cartographica: The International Journal for Geographic Information and Geovisualization 48 (2):134-143. doi: 10.3138/carto.48.2.1836.

134

Vandecasteele, A., & Devillers, R. 2015. "Improving Volunteered Geographic Information Quality Using a Tag Recommender System: The Case of OpenStreetMap." In OpenStreetMap in GIScience (Lecture Notes in Geoinformation and Cartography), edited by J. J. Arsanjani, A. Zipf, P. Mooney and M. Helbich, 59-80. Berlin: Springer. doi: 10.1007/978-3-319-14280-7_4.

Vanwolleghem, G., Dyck, D. V., Ducheyne, F., Bourdeaudhuij, I. D., & Cardon, G. 2014. "Assessing the environmental characteristics of cycling routes to school: a study on the reliability and validity of a Google Street View-based audit." International Journal of Health Geographics 13 (19). doi: 10.1186/1476-072X-13-19.

Yang, A., Fan, H., & Jing, N. 2016. "Amateur or professional: Assessing the expertise of major contributors in OpenStreetMap based on contributing behaviors." ISPRS International Journal of Geo-Information 5 (2):21. doi: 10.3390/ijgi5020021.

Yin, L., Cheng, Q., Wang, Z., & Shao, Z. 2015. "‘Big data’ for pedestrian volume: Exploring the use of Google Street View images for pedestrian counts." Applied Geography 63:337-345. doi: 10.1016/j.apgeog.2015.07.010.

Zamir, A. R., & Shah, M. 2010. "Accurate Image Localization Based on Google Maps Street View." In Computer Vision – ECCV 2010, edited by Kostas Daniilidis, Petros Maragos and Nikos Paragios, 255-268. Berlin, Heidelberg: Springer. doi: 10.1007/978-3-642-15561-1_19.

Zielstra, D., & Hochmair, H. H. 2011. "A Comparative Study of Pedestrian Accessibility to Transit Stations Using Free and Proprietary Network Data." Transportation Research Record: Journal of the Transportation Research Board 2217 (1):145- 152. doi: 10.3141/2217-18.

Zielstra, D., Hochmair, H. H., & Neis, P. 2013. "Assessing the effect of data imports on the completeness of OpenStreetMap–a United States case study." Transactions in GIS 17 (3):315-334. doi: 10.1111/tgis.12037.

Zielstra, D., Hochmair, H. H., Neis, P., & Tonini, F. 2014. "Areal Delineation of Home Regions from Contribution and Editing Patterns in OpenStreetMap." ISPRS International Journal of Geo-Information 3 (4):1211-1233. doi: 10.3390/ijgi3041211.

Zielstra, D., & Zipf, A. 2010. "Quantitative Studies on the Data Quality of OpenStreetMap in Germany". Sixth International Conference on Geographic Information Science (GIScience), Zurich, Switzerland, Sept 14-17.

135

BIOGRAPHICAL SKETCH

Levente Juhász was born in Debrecen, Hungary. He received his Bachelor of

Science and Master of Science degrees from the University of Szeged, Hungary in geography with a specialization in geoinformatics. He was appointed as a visiting scientist at the Joint Research Centre of the European Commission in Ispra, Italy, and at the Carinthia University of Applied Sciences in Villach, Austria. Between 2014 and

2018 he was a graduate research assistant at the University of Florida. Outside academia, he also served as a GIS developer and a data scientist. He obtained his PhD degree in Forest Resources and Conservation with a concentration in Geomatics from the University of Florida in 2018.

136