Can OpenStreet Map be Trusted for Modeling Travel ?
Eric Delmelle Derek Marsh Coline Dony
Department of Geography and Earth Sciences University of North Carolina at Charlo e SEDAAG, Pensacola, Florida 2015 GoogleMaps 575 miles 8h21 minutes
Rand McNally 575.1 miles 9h3 minutes Open Mapquest 575.83 miles 9h5 minutes ?
Yahoo Maps 575.03 miles 9h41 minutes Online geographic data providers
Web services such as: Google Maps, Bing Maps, MapQuest
• Provide unprecedented access to spa al data and analy cal tools • geocoding addresses • iden fying points of interest • determining travel direc ons
• Simple network analysis without the need for a GIS network dataset • No data prepara on necessary • Available to GIS and non-GIS users alike Online geographic data providers
• For sizeable use, generally require a paid license • Direc ons service requests are limited otherwise
Google Maps Bing Maps MapQuest 2,500/day 10,000/90-days 5,000/day
• An alterna ve is using openly sourced, public domain volunteered geographic informa on (VGI)
MapQuest Open Unlimited (15,000 month) Volunteered Geographic Informa on
“the widespread engagement of large numbers of private ci zens, o en with li le in the way of formal qualifica ons, in the crea on of geographic informa on” (Goodchild 2007)
One of the most successful examples of VGI, OpenStreetMap (OSM), offers a free, editable map of the world with no restric ons governing use for spa al analysis VGI data quality
Despite VGI’s poten al, the ques on remains:
What is the quality of this data?
“Because par cipants poten ally lack any formal training in geographic data collec on, central coordina on is weak to non-existent, and adherence to a par cular data structure is not required, no assump ons can be made about the overall quality of uploaded data” (Goodchild & Li 2012) Literature – VGI data quality - Compara ve assessments
• Girres & Touya (2010) – In comparison to the French Na onal Mapping Agency, point posi onal displacement was on average 6.65 meters
• Haklay (2010) – In comparison to the Ordnance Survey of Great Britain, greater than 81% overlap among major roads and an average of 6 meters point displacement of the OSM dataset within study sites across London
• Ciepłuch et al. (2010) – In comparison to Google Maps and Bing Maps, accuracy is inconsistent among all three providers 2 VGI data quality - Indicator assessments
“if one individual contributes an error, others can be expected to edit and correct the error, and the success of this mechanism rises in propor on to the number who look at the contribu on” (Linus law) (Goodchild & Li 2012)
• Haklay et al. (2010) – Posi onal accuracy improved with an increase in the number of contributors up to a threshold (n>13) at which improvement stabilized • Keßler & Groot (2010) – Without a reference dataset, the volume of user contribu on to an area or object in OSM is posi vely correlated to trustworthiness of the dataset 3 Research objec ves and ques ons I. Evalua ng the Uncertainty of Travel Impedance Es mates • What is the degree of uncertainty in travel impedance es mates among online road network data providers? • Do routes calculated using VGI data present significantly different travel impedance es mates in comparison to commercial online spa al datasets?
II. VGI User Contribu on – Applying Linus’s Law at the Network Object Level • Correla on between number of contributors and level of agreement? Methodology
Origins JavaScript Object Network Provider API Nota on (JSON) Travel Time Network Snapping O-D Pairs & Distance Ter ary Roads Lat/Long Points Google Maps Des na ons Es mates ArcGIS Online OpenStreetMap
Network Metadata API Network Metadata API OpenStreetMap MapQuest Open OpenStreetMap Case study area
• North Carolina offers several clear urban loca ons, a diverse road network, and a range of topographical environments to assess road network uncertainty. Methodology
Remove limited access roads from network dataset. Origins and destinations selected from tertiary roads Origins Network Snapping O-D Pairs Ter ary Roads Lat/Long Points Modified dataset ‘segmented’ at Des na ons nodes; begin nodes serve as candidate origin and destination points
Select n*2 number of randomly Specific implementa on study distributed of candidate points used area dependent; discussed to form n number of origin- further in results destination (OD) pairs
Store OD pairs in text file as latitude, longitude and unique identifier Results – OD selec on
Road network, State of North Carolina Exclude interstate highways
Iden fy begin and nodes of all resul ng road segments
Exclude begin nodes in the proximity of highways (incorrect snapping) (*) 300 pairs of ver ces were selected at random for each county (stra fied random sampling of ver ces)
Example of North Carolina - (total = 100,000 OD pairs): ≈14,300 pairs are selected in each of seven distance intervals: 0-50 kilometers (km), 50-100 km, 100-150 km, 150-200 km, 200-250 km, 250-500 km and 500-1000 km. It was necessary to increase the range of the category intervals for the longer distances to accomplish an equally stra fied sample. Results – OD selec on
Ex) North Carolina
All pairs of OD points
Spider map of OD pairs origina ng or termina ng in Ashe County Methodology Online data providers (k): Reference Datasets: • Google Maps (TeleAtlas)
• ArcGIS Online (NavTeq) JavaScript Object VGI Dataset: Network Provider API Nota on (JSON) Travel Time & Distance Google Maps • OpenStreetMap Es mates ArcGIS Online Technical Issues: OpenStreetMap • Google Direc ons API limited to 2,500 requests per day • ArcGIS Online requires license In Python: • OpenStreetMap direc ons For each OD pair, a URL string is algorithm provided by formed that includes the network MapQuest Open provider web address, OD • Assuming no significant coordinates, and rou ng difference due to heuris c or specifica ons. A new URL is created rou ng algorithm for each provider, k. • Travel es ma ons do not Results returned in JavaScript Object account for traffic or other real- Nota on (JSON), an easily read data me data th format that uses key-value pairs. • Precision limited to 1/10 mile Methodology
Travel Impedance Es mates
• dij: travel distance JavaScript Object • tij: travel me Network Provider API Nota on (JSON) Travel Time & Distance Google Maps Es mates ArcGIS Online OpenStreetMap Results
Low uncertainty in es mated travel distance • ArcGIS Online overes mates Results
Correla on Coefficients – NC Outlier(s)
Google Maps includes ferries in the rou ng calcula on Results What about contributors? Methodology
• Fewer contributors are required to validate shorter road segments, but a higher proportion of contributors is needed to verify the accuracy of a longer route • A sample of road segments is used from Network Metadata API Network Metadata API the total route; thus, the OpenStreetMap MapQuest Open user average is OpenStreetMap proportional to the length of known road segments
Results Linus’s Law
North Carolina OD pairs • Level of uncertainty decreases as number of contributors increases • Ini al increase in uncertainty corresponds to greatest sample of contributor averages (overall average = 3.27) • Large number of outliers Results at different distances 0-25mi 25-75mi
75-250mi >250mi Discussion and conclusion
• Correla on coefficients and percent difference both resulted in rela vely high agreement. 1. Uncertainty was extremely low at long travel distances 2. Shorter, county wide distances showed greater uncertainty among all providers 3. The VGI dataset OSM was as reliable as the two commercial providers in es ma ng travel distance 4. OpenStreetMap may be a viable dataset for rou ng and naviga on purposes within the selected study areas Discussion and conclusion
• VGI User Contribu on – Applying Linus’s Law at the Network Object Level
1. Disagreement decreases with increasing number of contributors 2. Rela onship not uniform across different route lengths. Future research opportuni es…
• Approach could be expanded to new areas of the OSM dataset (e.g. other regions and countries) • Urban travel • Rural travel
• Analyze overlap among individual routes to explain where and why travel impedance uncertainty occurred
• Is the trend of the Linus’s Law valid in other states and other countries? Thank you
Eric Delmelle Derek Marsh Coline Dony
Department of Geography and Earth Sciences University of North Carolina at Charlo e Results Correla on Coefficients – Mecklenburg County
Greater uncertainty across all providers • Correla on s ll high • Same pa ern of under/ overes ma on • Greater uncertainty at 20-35 miles Results Percent Difference – Mecklenburg County
Trend in correla on plots are corroborated by percent difference • ArcGIS Online produces greater uncertainty around 15 miles • OSM has greater uncertainty at 30 miles