Can OpenStreet Map be Trusted for Modeling Travel ?

Eric Delmelle Derek Marsh Coline Dony

Department of Geography and Earth Sciences University of North Carolina at Charloe SEDAAG, Pensacola, Florida 2015 GoogleMaps 575 miles 8h21 minutes

Rand McNally 575.1 miles 9h3 minutes Open Mapquest 575.83 miles 9h5 minutes ?

Yahoo Maps 575.03 miles 9h41 minutes Online geographic data providers

Web services such as: , , MapQuest

• Provide unprecedented access to spaal data and analycal tools • geocoding addresses • idenfying points of interest • determining travel direcons

• Simple network analysis without the need for a GIS network dataset • No data preparaon necessary • Available to GIS and non-GIS users alike Online geographic data providers

• For sizeable use, generally require a paid license • Direcons service requests are limited otherwise

Google Maps Bing Maps MapQuest 2,500/day 10,000/90-days 5,000/day

• An alternave is using openly sourced, public domain volunteered geographic informaon (VGI)

MapQuest Open Unlimited (15,000 month) Volunteered Geographic Informaon

“the widespread engagement of large numbers of private cizens, oen with lile in the way of formal qualificaons, in the creaon of geographic informaon” (Goodchild 2007)

One of the most successful examples of VGI, OpenStreetMap (OSM), offers a free, editable map of the world with no restricons governing use for spaal analysis VGI data quality

Despite VGI’s potenal, the queson remains:

What is the quality of this data?

“Because parcipants potenally lack any formal training in geographic data collecon, central coordinaon is weak to non-existent, and adherence to a parcular data structure is not required, no assumpons can be made about the overall quality of uploaded data” (Goodchild & Li 2012) Literature – VGI data quality - Comparave assessments

• Girres & Touya (2010) – In comparison to the French Naonal Mapping Agency, point posional displacement was on average 6.65 meters

• Haklay (2010) – In comparison to the Ordnance Survey of Great Britain, greater than 81% overlap among major roads and an average of 6 meters point displacement of the OSM dataset within study sites across London

• Ciepłuch et al. (2010) – In comparison to Google Maps and Bing Maps, accuracy is inconsistent among all three providers 2 VGI data quality - Indicator assessments

“if one individual contributes an error, others can be expected to edit and correct the error, and the success of this mechanism rises in proporon to the number who look at the contribuon” (Linus law) (Goodchild & Li 2012)

• Haklay et al. (2010) – Posional accuracy improved with an increase in the number of contributors up to a threshold (n>13) at which improvement stabilized • Keßler & Groot (2010) – Without a reference dataset, the volume of user contribuon to an area or object in OSM is posively correlated to trustworthiness of the dataset 3 Research objecves and quesons I. Evaluang the Uncertainty of Travel Impedance Esmates • What is the degree of uncertainty in travel impedance esmates among online road network data providers? • Do routes calculated using VGI data present significantly different travel impedance esmates in comparison to commercial online spaal datasets?

II. VGI User Contribuon – Applying Linus’s Law at the Network Object Level • Correlaon between number of contributors and level of agreement? Methodology

Origins JavaScript Object Network Provider API Notaon (JSON) Travel Time Network Snapping O-D Pairs & Distance Terary Roads Lat/Long Points Google Maps Desnaons Esmates ArcGIS Online OpenStreetMap

Network Metadata API Network Metadata API OpenStreetMap MapQuest Open OpenStreetMap Case study area

• North Carolina offers several clear urban locaons, a diverse road network, and a range of topographical environments to assess road network uncertainty. Methodology

Remove limited access roads from network dataset. Origins and destinations selected from tertiary roads Origins Network Snapping O-D Pairs Terary Roads Lat/Long Points Modified dataset ‘segmented’ at Desnaons nodes; begin nodes serve as candidate origin and destination points

Select n*2 number of randomly Specific implementaon study distributed of candidate points used area dependent; discussed to form n number of origin- further in results destination (OD) pairs

Store OD pairs in text file as latitude, longitude and unique identifier Results – OD selecon

Road network, State of North Carolina Exclude interstate highways

Idenfy begin and nodes of all resulng road segments

Exclude begin nodes in the proximity of highways (incorrect snapping) (*) 300 pairs of verces were selected at random for each county (strafied random sampling of verces)

Example of North Carolina - (total = 100,000 OD pairs): ≈14,300 pairs are selected in each of seven distance intervals: 0-50 kilometers (km), 50-100 km, 100-150 km, 150-200 km, 200-250 km, 250-500 km and 500-1000 km. It was necessary to increase the range of the category intervals for the longer distances to accomplish an equally strafied sample. Results – OD selecon

Ex) North Carolina

All pairs of OD points

Spider map of OD pairs originang or terminang in Ashe County Methodology Online data providers (k): Reference Datasets: • Google Maps (TeleAtlas)

• ArcGIS Online () JavaScript Object VGI Dataset: Network Provider API Notaon (JSON) Travel Time & Distance Google Maps • OpenStreetMap Esmates ArcGIS Online Technical Issues: OpenStreetMap • Google Direcons API limited to 2,500 requests per day • ArcGIS Online requires license In Python: • OpenStreetMap direcons For each OD pair, a URL string is algorithm provided by formed that includes the network MapQuest Open provider web address, OD • Assuming no significant coordinates, and roung difference due to heurisc or specificaons. A new URL is created roung algorithm for each provider, k. • Travel esmaons do not Results returned in JavaScript Object account for traffic or other real- Notaon (JSON), an easily read data me data th format that uses key-value pairs. • Precision limited to 1/10 mile Methodology

Travel Impedance Esmates

• dij: travel distance JavaScript Object • tij: travel me Network Provider API Notaon (JSON) Travel Time & Distance Google Maps Esmates ArcGIS Online OpenStreetMap Results

Low uncertainty in esmated travel distance • ArcGIS Online overesmates Results

Correlaon Coefficients – NC Outlier(s)

Google Maps includes ferries in the roung calculaon Results What about contributors? Methodology

• Fewer contributors are required to validate shorter road segments, but a higher proportion of contributors is needed to verify the accuracy of a longer route • A sample of road segments is used from Network Metadata API Network Metadata API the total route; thus, the OpenStreetMap MapQuest Open user average is OpenStreetMap proportional to the length of known road segments

Results Linus’s Law

North Carolina OD pairs • Level of uncertainty decreases as number of contributors increases • Inial increase in uncertainty corresponds to greatest sample of contributor averages (overall average = 3.27) • Large number of outliers Results at different distances 0-25mi 25-75mi

75-250mi >250mi Discussion and conclusion

• Correlaon coefficients and percent difference both resulted in relavely high agreement. 1. Uncertainty was extremely low at long travel distances 2. Shorter, county wide distances showed greater uncertainty among all providers 3. The VGI dataset OSM was as reliable as the two commercial providers in esmang travel distance 4. OpenStreetMap may be a viable dataset for roung and navigaon purposes within the selected study areas Discussion and conclusion

• VGI User Contribuon – Applying Linus’s Law at the Network Object Level

1. Disagreement decreases with increasing number of contributors 2. Relaonship not uniform across different route lengths. Future research opportunies…

• Approach could be expanded to new areas of the OSM dataset (e.g. other regions and countries) • Urban travel • Rural travel

• Analyze overlap among individual routes to explain where and why travel impedance uncertainty occurred

• Is the trend of the Linus’s Law valid in other states and other countries? Thank you

Eric Delmelle Derek Marsh Coline Dony

Department of Geography and Earth Sciences University of North Carolina at Charloe Results Correlaon Coefficients – Mecklenburg County

Greater uncertainty across all providers • Correlaon sll high • Same paern of under/ overesmaon • Greater uncertainty at 20-35 miles Results Percent Difference – Mecklenburg County

Trend in correlaon plots are corroborated by percent difference • ArcGIS Online produces greater uncertainty around 15 miles • OSM has greater uncertainty at 30 miles