Methods of Geoinformation Science Institute of Geodesy and Geoinformation Science Faculty VI Planning Building Environment

MASTER’S THESIS

Deriving incline for street networks from voluntarily collected GPS traces

Submitted by: Steffen John

Matriculation number: 343372

Email: [email protected]

Supervisors: Prof. Dr.-Ing. Marc-O. Löwner (TU )

Dr.-Ing. Stefan Hahmann (Universität Heidelberg)

Submission date: 24.07.2015

in cooperation with:

GIScience Group Institute of Geography Faculty of Chemistry and Earth Sciences Declaration of Authorship

I, Steffen John, declare that this thesis titled, 'Deriving incline for street networks from voluntarily collected GPS traces’ and the work presented in it are my own. I confirm that:

 This work was done wholly or mainly while in candidature for a research degree at this Uni- versity.  Where any part of this thesis has previously been submitted for a degree or any other qualifi- cation at this University or any other institution, this has been clearly stated.  Where I have consulted the published work of others, this is always clearly attributed.  Where I have quoted from the work of others, the source is always given. With the exception of such quotations, this thesis is entirely my own work.  I have acknowledged all main sources of help.  Where the thesis is based on work done by myself jointly with others, I have made clear exact- ly what was done by others and what I have contributed myself.

Signed:

Date:

ii

Abstract

The knowledge of incline is useful for many use-cases in navigation for electricity-powered vehicles, cyclists or mobility-restricted people (e.g. wheelchair users). Digital elevation models (DEMs) such as from laser scanning obtained DEMs or SRTM are either too expensive, not globally available or not accurate enough. Therefore, voluntarily collected GPS traces collect by users of the OpenStreetMap project have been used to derive the incline of a street network. Due to a high relative accuracy of the GPS traces, the incline was computed in a, for many use-cases, reasonable accuracy. The comparison with the SRTM DEM has shown that the inclines calculated with GPS perform slightly better with a standard deviation of σGPS = 1.6 % (σSRTM = 3.1 %), considering street with at least 5 GPS traces. Contrary to SRTM with a full coverage, the incline could only be derived for 18 % of the street network (> 5 traces).

Kurzfassung (Abstract in German Language)

Steigungsinformationen haben einen Mehrwert für viele Routing Anwendungen, zum Beispiel für das Routing von elektronisch betriebenen Fahrzeugen, Radfahrenden oder Menschen mit Mobilitätsein- schränkungen (z.B. Rollstuhlfahrende). Digitale Geländemodelle (DGM), wie durch Laserscanning erstellte DGMs oder SRTM-1 DGM, sind entweder zu teuer, nicht flächendeckend vorhanden oder unzureichend in der Auflösung und der Genauigkeit. Daher sollen nutzergenerierte GPS Trajektorien genutzt werden um die Steigung von Straßen zu berechnen. Aufgrund der festgestellten hohen relativen Genauigkeit der Trajektorien war es möglich die Steigung in einer für viele Anwendungen ausreichenden Genauigkeit zu berechnen. Der Vergleich mit dem SRTM DGM hat ergeben, dass die

Steigungen aus GPS Daten mit einer Standardabweichung von σGPS = 1,6 % besser sind

(σSRTM = 3,1 %). Für die Ermittlung der Standardabweichung wurden ausschließlich Straßen mit mindestens 5 GPS Trajektorien verwendet. Im Gegensatz zu SRTM konnten die Steigungen nicht für alle, sondern nur für 18 % der Straßen bestimmt werden (mit mehr als 5 Trajektorien).

iii

Table of Content

1 Introduction ...... 1 1.1 Motivation ...... 1 1.2 Objectives ...... 3 1.3 Outline ...... 4

2 Background ...... 5 2.1 Global Navigation Satellite Systems ...... 5 2.1.1 GPS Setup and Determination of Location ...... 5 2.1.2 Error Sources ...... 7 2.1.3 GLONASS, Galileo and Beidou ...... 9 2.2 Volunteered Geographic Information ...... 10 2.2.1 Terminology and Nature of VGI ...... 10 2.2.2 Classification and Examples ...... 12 2.3 OpenStreetMap ...... 13 2.3.1 Introduction to Project ...... 13 2.3.2 Data Model ...... 15 2.3.3 Incline Information in OpenStreetMap ...... 16 2.4 Data Mining ...... 17

3 Related Work ...... 19 3.1 3D Routing ...... 19 3.1.1 Wheelchair routing ...... 19 3.1.2 Energy-efficient routing ...... 20 3.2 Extraction of Street Attributes from user-generated Movement Trajectories ...... 21 3.3 Derivation of 3D information, using high-accurate GPS measurements ...... 22 3.4 Matching ...... 23 3.4.1 Categorization of Map Matching Algorithms ...... 24 3.4.2 Functionality of Selected Algorithms ...... 24 3.5 Smoothing of Time Series Measurements ...... 26

4 Methodology ...... 28 4.1 Definition of Pilot Region ...... 28 4.2 Tools ...... 29 4.3 Data ...... 29 4.3.1 Crowdsourced GPS traces ...... 29 4.3.1.1 Platforms and Devices ...... 30 4.3.1.2 The GPX Format ...... 31

iv

4.3.1.3 OpenStreetMap GPS traces ...... 32 4.3.1.4 Typical Errors ...... 34 4.3.2 Street Network ...... 35 4.3.3 Land Use Information ...... 36 4.3.4 Digital Elevation Models ...... 37 4.4 Workflow and Implementation ...... 39 4.4.1 Data Import ...... 40 4.4.1.1 GPS traces ...... 40 4.4.1.2 OSM Street Network and Land Use Information ...... 43 4.4.2 Preprocessing ...... 43 4.4.2.1 GPS data ...... 43 4.4.2.2 Street Network ...... 46 4.4.3 Map Matching...... 47 4.4.4 Calculation of Incline ...... 52 4.5 Validation ...... 55

5 Discussion of Results ...... 57 5.1 Analysis of Crowdsourced GPS traces ...... 57 5.1.1 Vertical Absolute and Relative Accuracy ...... 57 5.1.1.1 Absolute Accuracy ...... 58 5.1.1.2 Relative Accuracy ...... 60 5.1.2 Coverage and density ...... 62 5.2 Analysis of Calculated Incline ...... 65 5.2.1 Exclusion of data from the evaluation ...... 66 5.2.2 Accuracy of GPS incline ...... 66 5.2.2.1 Overall error ...... 67 5.2.2.2 By Land Use Classes ...... 69 5.2.2.3 By Terrain Classes (mountainous / flat) ...... 70 5.2.2.4 Effect of Number of GPS Traces on Overall Accuracy ...... 71 5.2.3 Comparison GPS incline and SRTM incline ...... 72 5.2.3.1 By Land Use Classes ...... 73 5.2.3.2 By Terrain Classes ...... 75 5.3 Limitations of Approach ...... 75

6 Conclusion and Outlook ...... 77 6.1 Conclusion ...... 77 6.2 Outlook ...... 81

7 Bibliography ...... 83

v

List of Figures

Figure 1: A steep slope of a street or path may be inaccessible for wheelchair users. (© - user: ‘Transguyjay’) ...... 1 Figure 2: Depending on the street, 0 to many GPS track points fall into one square of 1’’ × 1’’ equivalent to horizontal resolution of SRTM-1. Due to the projection, the grid is not squared. (Map: OSM) ...... 3 Figure 3: Determination of a 2D position with three satellites (©Anja Köhn, Michael Wößner) ...... 7 Figure 4: How the satellite constellation influences precision. In (a) the transmitters are orthogonal, which keeps the error region small. If the transmitters are closer together, the error region gets larger (b). (Langley 1999) ...... 8 Figure 5: The Multipath and Shadowing effect (Conley et al. 2006, p. 280) ...... 9 Figure 6: Density map of OpenStreetMap nodes (© OpenStreetMap -user ‘Tyr’) ...... 12 Figure 7: OSM data model for map feature (left) and file system for GPX files (right) (adopted Ramm & Topf 2010, p. 56) ...... 16 Figure 8: Map Matching. The GPS trace (blue) is snapped to the street network (red) (Map: OSM) ...... 23 Figure 9: Example of 'Median of 3' -smoothing with the raw data (row 1) and the results using the single median smoothing and the repeated meadian smoothing. (Tukey 1977, p. 212) ...... 27 Figure 10: Pilot region Heidelberg / . (Map: OSM) ...... 28 Figure 11: Example GPX file...... 31 Figure 12: Screenshot of grid map, shown the number of GPS points per grid cell...... 32 Figure 13: Elevation profile of a GPS trace, recorded on a flat street...... 34 Figure 14: GPS traces with lost GPS-signals in tunnels. (Map: OSM) ...... 34 Figure 15: Difference of DSM and DTM...... 38 Figure 16: Process of deriving incline information out of user-contributed GPS traces...... 39 Figure 17: Filtering and import of GPS traces...... 41 Figure 18: The schema of the relation 'gpx_data_line' for storing the GPS traces...... 41 Figure 19: Flowchart of preprocessing the GPS traces ...... 44 Figure 20: Columns of the relation, which stores the preprocessed GPS traces...... 45 Figure 21: Schema of relation 'streets'...... 46 Figure 22: Enhancement of street network with land use information in cases, where land use polygon does not cover the street segment...... 47 Figure 23: Flowchart of map matching process...... 47

vi

Figure 24: The map matching process: Select candidate traces with buffer (light green) of street (dark green) (a), create profile lines (blue) (b), select traces (red) which intersect at least 70 % of the profile lines...... 49 Figure 25: Example for two parallel street, which are do not have the same incline...... 49 Figure 26: The tables 'gpx_data_line', 'streets_gpx' and 'streets' and their relation to each other...... 50 Figure 27: Properties file of map matching tool ...... 51 Figure 28: Workflow for calculating the incline of street segments...... 52 Figure 29: Clipping of assigned GPS traces...... 53 Figure 30: Screenshot of visualized GPS track points, colorized according to their elevation. (green=low, red=high) ...... 57 Figure 31: Vertical accuracy of crowdsourced GPS traces, distinguished by land use class...... 58 Figure 32: Histogram with the differences of GPS and DTM elevation ...... 60 Figure 33: Relative accuracy of crowdsourced GPS track points, overall and distinguished by land uses...... 61 Figure 34: Map, showing the coverage of the streets with GPS traces. (Map: OSM) ...... 63 Figure 35: The coverage with GPS traces for different street types...... 64 Figure 36: Average distance of two adjacent GPS track points differentiated by street type...... 65 Figure 37: Visualization of the GPS incline. Streets with no coverage are not displayed. (Map: OSM) ...... 65 Figure 38: Erroneously calculated DTM incline, due to irregularities of the LiDAR DTM...... 66 Figure 39: Visualization of the error of GPS incline in the pilot region. (Map: OSM) ...... 67 Figure 40: Histogram of the overall incline error in percent and the bell-curve (red)...... 68 Figure 41: The percentage of streets, with an incline error smaller than 2 % and their share with respect to the entire street network...... 72 Figure 42: Situations where the calculated incline differs from the steepest incline...... 76

vii

List of Tables

Table 1: Categories of VGI project according to Jokar Arsanjani (2014) ...... 12 Table 2: Usage of the key 'incline' and its values ...... 16 Table 3: Overview of visibility options for the upload of GPS traces...... 33 Table 4: Values of highway and their share of length in percent...... 35 Table 5: OSM landuse-tags and their characteristics...... 37 Table 6: The effect of the relative accuracy on the calculated incline...... 62 Table 7: The length of street segments for different incline error classes...... 69 Table 8: The achieved accuracy of GPS incline differentiated by land use classes...... 70 Table 9: The achieved accuracy of GPS incline differentiated by terrain classes...... 71 Table 10: Comparison of SRTM and GPS incline in terms of amount of street network with an incline error smaller than 2 %...... 73 Table 11: Comparison of the standard deviations of the incline error, overall and differentiated by land use classes...... 74 Table 12: Comparison of the standard deviations of the incline, overall and differentiated by terrain classes...... 75

viii

1 Introduction

1 Introduction

1.1 Motivation

Common routing and navigation systems such as Maps1 or Here2 do not consider elevation or incline information in the calculation of directions. This is due to the fact that they were initially designed for the calculation of directions for or other fuel-powered vehicles, which do generally not benefit of incline information. Many other use-cases exist, in which one would appreciate the knowledge about the incline of streets. For cyclists, pedestrians and especially for mobility-restricted people the incline of a planned route is of high relevance (cf. Figure 1). Some cyclists might prefer to take a slightly longer but less steep route, while other cyclists may prefer inclined streets due to training reasons. Even more relevant is the incline information for mobility- restricted people such as wheelchair users, people with walking aids or parents with push-chairs. For this group of people steep streets or paths may be inaccessible, since they can only pass inclines up to a certain percentage uphill or downhill. Obviously, the magnitude of incline which can be passed by people with walking-aids highly depends on the disability and the type of wheelchair (manual / electric). Moreover, electric wheelchairs or in general electricity powered vehicles have a higher energy demand when going uphill and a limited battery capacity. In addition, charging stations are still rare. Therefore, incline information can be utilized by routing services to compute the most efficient route in terms of power consumption (cf. Franke et al. 2012).

Figure 1: A steep slope of a street or path may be inaccessible for wheelchair users. (© Flickr-user: ‘Transguyjay’3)

The GIScience Research Group of the University of Heidelberg (and other partners) is currently working on a project to extend and improve the OSM routing service OpenRouteService.org to include accessibility related data, out of which the motivation for this thesis arose. The EU-project

1 http://google.de/maps, checked on 15/07/2015 2 http://here.com, checked on 15/07/2015 3 Source of image: https://www.flickr.com/photos/jayw/2604877785, checked on 15/07/2015 1

1 Introduction is called CAP4Acess which is an acronym for ‘Collective Awareness Platforms for Improving Accessibility in European Cities and Regions’. Due to this project the cooperation between the Institute of Technology Berlin (Technische Universität Berlin) and the University of Heidelberg was established for this thesis. The aim of the project is to develop methods and tools for collec- tively gathering and sharing information about the accessibility of public spaces. The project focuses on different topics, including for example “Collective tagging”, “Participatory sensing” and “Routing and navigation”. There are four pilot regions for this project in Vienna (), (UK), Elche () and Heidelberg (Germany) (cf. empirica Gesellschaft für Kommunikations- und Technologieforschung mbH 2015).

For the calculation of the incline of a street, different types of digital elevation models (DEMs) may be used. The most accurate possibility is using a high resolution DEM, acquired from airborne laser detection and ranging (LiDAR). This method is very expensive, therefore, open-licensed DEMs may be an alternative. Depending on the Open-Data strategy of the authorities, high- resolution DEMs are available for some regions (cf. OpenStreetMap Wiki 2015d). There are also open-licensed DEMs which are almost globally available, SRTM and ASTER GDEM. The SRTM DEM is acquired from the Shuttle Radar Topography Mission (SRTM) and is available online with a horizontal resolution of 1 arc second (30 m) and an absolute elevation error of 6.2 m (cf. Farr et al. 2007). The ASTER Global DEM was compiled from data collected by the ‘Advanced Space- borne Thermal Emission and Reflection Radiometer’ (ASTER), mounted on the Terra spacecraft. The global DEM has a horizontal resolution of 30 m (1 arc second) and a vertical accuracy of approximately 9 m (cf. Meyer 2011).

Open-licensed DEMs with high accuracy are not globally available, whereas those DEMs which are nearly globally available suffer from a poor horizontal resolution and vertical accuracy. Especially for hilly or mountainous regions and high-resolution scenarios this data might not be sufficient to derive the incline of streets with an acceptable accuracy. Therefore, I propose a method to derive incline information from GPS traces, contributed by users of the OpenStreetMap project.

Due to the fast development of mobile phones with integrated GPS receivers, GPS traces can easily be recorded by everybody. According to Liu et al. (2014), a positional accuracy of 5 to 10 meters and a vertical accuracy of up to 25 meters can be expected from GPS traces collected by handheld GPS devices or smartphones. Admittedly, the vertical accuracy is very poor, however, for calculating the incline only the elevation differences of two adjacent points are relevant. If a GPS trace was recorded within a short time span in an open area it may be assumed that all points of the trace are recorded under similar atmospheric influences and with a similar satellite constella- tion. Therefore, it may be expected that the track points of one GPS trace have a similar absolute 2

1 Introduction error and consequently a fairly good relative accuracy. In addition, the coverage of multiple GPS traces per street as well as a relative high density of GPS track points, may compensate a poor accuracy. Figure 2 shows the grid of 1 by 1 arc second which is equivalent to the horizontal resolution of SRTM DEM as well as the GPS track points extracted from the OpenStreetMap GPS data. It can be seen that many GPS track points fall into one square, for most of streets, but there are also some streets with none or just a few GPS track points.

Figure 2: Depending on the street, 0 to many GPS track points fall into one square of 1’’ × 1’’ equivalent to horizontal resolution of SRTM-1. Due to the projection, the grid is not squared. (Map: OSM)

1.2 Objectives

The objective of this thesis is to develop methods and tools to calculate the incline of a street network, including paths for pedestrians and cyclists. The incline shall be calculated out of GPS traces which are collected by contributors the OpenStreetMap project, since this may represent a low-cost alternative to expensive high-accuracy DEMs. The GPS data is a collection of GPS traces, collected by thousand users with different devices and transportation mode. The device and transportation is not given in the data, which makes it difficult to judge the accuracy and density of GPS track points. Therefore, the GPS raw data shall be assessed with regard to accuracy, coverage and density. Furthermore, the incline calculated from GPS traces shall be validated, using a high-accuracy DEM, obtained from LiDAR measurements, to see how accurate the incline was calculated. As described in the motivation, globally available DEMs represent, also represent as alternative to high-accuracy DEMs for deriving incline. Thus, the incline, calculated from GPS traces shall also be compared to the incline derived from the SRTM-1 DEM. It is intended that due to a higher density of elevation information, a higher accuracy will be achieved with user- generated GPS traces. The tools developed for the purpose of this thesis, shall be published and provided to the OpenStreetMap community, since tools for processing GPS data are still rare. 3

1 Introduction

To summarize, the aims of this thesis can shortly be formulated as follows:

- Creation and implementation of a workflow to calculate the incline of streets, using user- contributed GPS traces. - Assessment of the quality of voluntary collected GPS traces in terms of o vertical accuracy (absolute and relative) o coverage of GPS traces - Assessment of the achieved quality of the incline information, compared to LiDAR and SRTM-1 DEM. - Publication of developed software as and provision to the OpenStreetMap community

1.3 Outline

The thesis will be structured as follows. In chapter 2 background information, which are important this topic, will be discussed. This involves the topics, Global Navigation Satellite Systems, Volunteered Geographic Information, the OpenStreetMap project and data mining. In chapter 3, different researches about related topics are presented. The methodology of this research is described in chapter 4, which includes the used data and tools as well as all the steps of deriving incline from user-generated GPS traces. The outcome of the methods, described in chapter 4, will in chapter 5 be judged and discussed using statistical methods under the consideration of a high- accuracy DTM on the one hand and the low-cost alternative SRTM-1 DEM on the other hand. Furthermore, chapter 5 includes the quality assessment of the GPS-data. In chapter 6 this thesis will be summarized and concluded and ideas on how to progress with this topic in the future will be given.

4

2 Background

2 Background In the following chapter background information related to this research shall be given. Firstly, the Global Positioning System and other Global Navigation Satellite Systems will be introduced and their functionality explained, since GPS traces are one of the major data sources of this research. The traces are collected voluntarily; therefore an overview of volunteered geographic information VGI is given. After that, one of the most popular VGI projects, OpenStreetMap, will be introduced. At the end of this chapter, the terms data mining and spatial data mining will be discussed.

2.1 Global Navigation Satellite Systems

Nowadays, Global Navigation Satellite Systems (GNSS) are an essential part in the field of navigation and positioning. With GNSS it is possible to determine any location on the earth’s surface with fairly good accuracy. Since this research is about mining information from movement trajectories recorded with the help of such systems, an overview of which systems exist and how they work shall be given. Several countries either operate a GNSS or are currently building one, however this section shall cover the set up and functionality of the Global Positioning System (GPS) only, since this is the first and most stable GNSS. GPS is the GNSS of the of America. Furthermore, the method for determining a location will be explained, followed by an overview of error sources and their impact on the accuracy. An overview of other GNSS is given at the end in section 2.1.3.

2.1.1 GPS Setup and Determination of Location The Global Positioning System (GPS) is developed and operated by the U.S. Department of Defense. It was initially developed for military reasons, and in the beginning the accuracy was degraded for civilian use. This is known as Selective Availability (SA). In 2000, the degradation of accuracy was switched off, which now offers higher accuracy to civilian users. This enabled the realization of many standard applications, such as the private use of so-called Location-Based- Services (cf. Hofmann-Wellenhof et al. 2008, pp. 309–311).

For the GPS set up, three segments play an important role. These are the space segment, the control segment and the user segment. The space segment consists of 24 active and several spare satellites. The active satellites are spaced in six orbits with an altitude of 20,200 km. The control segment consists of several control stations distributed around the earth. The tasks of the control segment are, among others, to track the satellites for the determination of their orbit and the synchronization of the atomic clocks, mounted on the satellites. The user segment is referred to as the receiver, which receives the signals emitted by the satellites and calculates the current location. Further

5

2 Background information on the GPS segments can be taken from Hofmann-Wellenhof et al. (2008, pp. 322– 327).

As already mentioned, the position of the satellites at a certain time is known through the orbital parameters, which are observed by the control segment. The satellites are constantly emitting signals which can then be received by the receiver. The signal contains two carrier waves, L1 with a frequency of 1575.42 MHz and L2 with 1227.60 MHz. Upon both waves, codes are modulated which represents a message containing the information about the satellite such as orbit parameters and time of signal emission. While on L1 both the C/A-code (coarse/acquisition) and P-code (precision) are modulated, L2 only carries the P-code. The combination of the P-code from two carrier waves allows a higher positional accuracy through the elimination of ionospheric influ- ences. In addition, the P-code is encrypted to ensure that it is only available for authorized users, like the military (cf. Hofmann-Wellenhof et al. 2008, pp. 315-322)

Hofmann-Wellenhof et al. (2008, pp. 161-191) mention three mathematical models for position- ing. These are single point positioning, differential positioning and relative positioning. Single point positioning and differential positioning will be described below. Relative positioning is not relevant for this research. For further information on this, see Hofmann-Wellenhof et al. (2008, pp. 173-191). Single point positioning is applied when determining the position using smartphones or other handheld devices. When using this method, the pseudoranges between the satellites and the receiver are determined. This can be done by either using the code modulated on the carrier waves, using the phase of the carrier wave, or based on Doppler data. Here, only the first approach will be explained, since common smartphones or other handheld devices make use of the code. To determine the 2D position (X,Y) of a location, the pseudoranges of at least three satellites are necessary. Two of them are used to calculate the distance between satellites and receiver by multiplying the time of travel by the speed of light. To determine the time of travel, it is necessary for the receiver’s clock to be synchronized with the satellite clock. Since this is not the case prior to measuring, the pseudorange to the third satellite must be known in order to correct the clock bias. This is depicted in Figure 3. The solid lines show the pseudorange to the satellites before the clock correction. It can be seen that there are three intersection points which are possible locations for the receiver, depicted as ‘B’. After the correction of the clock and the correction of the pseudoranges (dashed line) which follows, only one intersection point remains (A).

6

2 Background

Figure 3: Determination of a 2D position with three satellites (©Anja Köhn, Michael Wößner4)

When determining a 3D position (X,Y,Z), four instead of three satellites are necessary, since there is one more unknown. Considering all measurements, a non-linear equation system as shown in Hofmann-Wellenhof et al. (2008, p. 162) can be solved to determine the unknown coordinates of the receiver’s location. According to Cosentino et al. (2006, p. 379), a horizontal accuracy of around 10 m can be achieved in 95 % of cases when applying single point positioning with one frequency. This is because the measurements are influenced by several factors which will be discussed in section 2.1.2. To omit some of the influences and thereby improve the accuracy of the measurement, differential point positioning may be performed. To do so, two receivers are needed, a reference receiver and a remote receiver. The coordinates of the reference station are known and considered as true value. Consequently, the error of the observed pseudoranges can be determined. The observed error can then be transmitted to the remote receivers and will be used to correct the pseudoranges (cf. Hofmann-Wellenhof et al. 2008, p. 169).

2.1.2 Error Sources As already mentioned the measurements are influenced by errors arising from different sources. Hofmann-Wellenhof et al. (2008) categorize the errors according to their sources, namely satellite, signal propagation and receiver.

Satellite: Errors originating from the satellite are the satellite clock bias and orbital errors. The highly- accurate atomic clocks of the satellites are controlled and frequently updated by the control segment on earth in order to synchronize the satellites among themselves. The clocks get an update once a day, and therefore the clock error is small immediately following the update and increases until the next update. In addition to the clock biases, errors are also contained in the ephemeris

4 Source of image: http://www.kowoma.de/gps/Positionsbestimmung.htm, checked on 15/07/2015 7

2 Background data, transmitted to the receiver. The ephemeris data contains information about the satellite’s orbit and is used to calculate the satellite’s position. The orbital parameters are estimated and may differ from the actual orbit of the satellite (cf. Conley et al. 2006, pp. 304 f.).

In addition to the aforementioned error sources originating from the satellite, precision also depends on the satellite constellation in the sky. The effect is shown in Figure 4 in the case of a simple ranging system with two transmitters. When the rays of the two transmitters (satellites) have an intersection of 90 °, the region in which the receiver may lie is relatively small (Figure 4a). If the transmitters are closer together as shown in Figure 4b, the region becomes larger and with it the uncertainty of the location determination. This is the reason why the vertical accuracy is generally worse than the horizontal one. In case of the horizontal coordinates, the satellites may be in good constellation, meaning that there is a satellite in every direction, keeping the error region small. In case of the vertical coordinate, all satellites are above the receiver and therefore only in one direction. (cf. Langley 1999)

Figure 4: How the satellite constellation influences precision. In (a) the transmitters are orthogonal, which keeps the error region small. If the transmitters are closer together, the error region gets larger (b). (Langley 1999)

Signal Propagation: During the propagation of signals through the atmosphere, a delay occurs. The ionosphere, which is the layer from approximately 50 km to 1000 km above the earth, is a dispersive medium. The dispersion is dependent on the frequency. Thus, it is possible to correct the ionospheric influences when applying a dual-frequency single point positioning. The different frequencies L1 and L2 (cf. 2.1.1) have a different delay, or in other words, a different propagation speed. A correction can therefore be determined (cf. Conley et al. 2006, p. 161).

8

2 Background

Receiver: Errors caused on the receiver side are, among others, the multipath effect and shadowing. Both are depicted in Figure 5. The multipath effect occurs when in addition to the direct signals, reflected signals from the surfaces of nearby structures are also captured by the receiver. This leads to errors in the calculation of the pseudorange, since the reflected signal traveled a longer way and conse- quently took more time. Shadowing occurs when the view from the receiver to the satellite is shadowed by trees or roofs. As a result, the signal reaches the receiver either with low energy or not at all and cannot be used for positioning. Multipath and shadowing can also occur in combina- tion, as shown in Figure 5. The signal reflected on the building is received with higher energy than the signal shadowed by the canopy. Multipath and Shadowing effects are random and highly dependent on the time and the receiver’s location. The error caused by these effects can be high in magnitude and can sometimes be the main contributor to the error in comparison to the other error sources (cf. Conley et al. 2006, pp. 279-280). User-generated GPS traces are recorded without consideration of such effects, also on location where multipath and shadowing effects have a big share of the error. This is mainly the case in urban areas with high buildings or in forested areas.

Figure 5: The Multipath and Shadowing effect (Conley et al. 2006, p. 280)

2.1.3 GLONASS, Galileo and Beidou In addition to GPS, other GNSS worth considering include, GLONASS, Galileo and Beidou. GLONASS is operated by and is, like GPS, fully operational with 21 active and 3 spare satellites in three orbital planes (cf. Hofmann-Wellenhof et al. 2008, pp. 348-349). Since 1996, when GLONASS was fully operational the first time, several satellites failed over the years and GLONASS could not be operated with the full coverage. Several new satellites were launched, but this was not enough to maintain the full constellation (Feairheller & Clark 2006). Nowadays, GLONASS is again fully operational. Galileo and Beidou are still under construction and do not have world-wide coverage yet. While Galileo, the European answer to GPS and GLONASS, has

9

2 Background only four satellites launched, the Chinese Beidou consists of 14 satellites and already operates in the Asian-Pacific regions. It is planned that Beidou will reach its full constellation with 35 satellites in different orbits by 2020. Once this happens, it will then have a world-wide coverage, similar to GPS and GLONASS (Santerre et al. 2014). Galileo is a project of the and European Space Agency to build a GNSS, similar to GPS and GLONASS, but under civilian control. The development of the system was initiated in 1994. The first idea was to cooperate with the United States to develop a “next-generation” GPS, however the United States did not wish to cooperate with foreign countries. Therefore, it was decided to build up a new and independent GNSS, which is interoperable with the existing GPS system and GLONASS. Consequently, receivers could use the three systems in combination. Moving forward, it will be possible to achieve higher accuracies, since more satellites are involved in the positioning process (cf. Hofmann-Wellenhof et al. 2008, pp. 365-367). Galileo is currently still under construction and it is planned to be fully operational by 2020. Two satellites were launched in both 2011 and 2012. With a total number of four satellites, first tests of the system were then possible. When the system is fully operational, a total of 30 satellites will be orbiting around the earth at an altitude of 23,222 km. Out of the 30 satellites, 27 will be actively used and three will be available as replace- ments (European Space Agency 2015).

2.2 Volunteered Geographic Information

For this research voluntarily collected GPS and street level data is used. For this special type of data the term ‘Volunteered Geographic Information” has emerged (Goodchild 2007). It describes a special case of user-generated content (UGC). According to Bauer (2010) the term UGC has been used since the mid-nineties for content in the internet, which is produced by the user. Due to the fast development of technologies regarding the internet, it has become possible and affordable for many users to have fast internet access. This development has made it possible for the user not only to search the internet, but also to create new content. The term UGC is kept general intentionally, since it may be any kind of media such as videos, pictures or text. When this data refers to a spatial location, it is known as ‘Volunteered Geographic Information’. The terminology and characteristics of VGI is discussed below, and examples of how VGI can be classified into groups are provided.

2.2.1 Terminology and Nature of VGI The term ‘volunteered geographic information’ (VGI) was introduced by Goodchild in 2007. It describes a phenomenon which was new in the field of geography at the time. Geographic information is collected voluntarily by mostly untrained people without any financial compensa- tion. He also calls this phenomenon “citizens as sensors”.

10

2 Background

It has also been referred to as ‘ geospatial information’ by Heipke (2010) and Ramm et al. (2011). Sui (2008) describes the recent development as the ‘wikification of GIS’ and points out, that the actors and methods of collecting geographic information has changed. Preciousy, only experts like surveyors or cartographers were acquired and processed geodata, which was expensive. Nowadays, there is a large amount of data freely available and the people who are acquiring and processing the data are not necessarily experts anymore.

Resch (2013) distinguishes between different concepts of acquiring the data. In his paper, he discusses the difference between the terms ‘citizen as sensor’, ‘collective sensing’ and ‘citizen science’. While these concepts are closely related, there are differences worth noting. ‘People as sensors’ describes the concept of people who collect information through subjective observations. This might, for example, be the smoothness of a street surface or the water quality of lakes. The term ‘collective sensing’ ”[…] analyses anonymized data coming from collective networks, such as Flickr, Twitter, or the mobile phone network” (Resch 2013). The third term, ‘citizen science’, means that people contribute data, collected by sensors integrated in their smartphone or other devices. In comparison to “people as sensors” this data is not subjective and only comes from sensor measurements.

VGI is often collected with the help of low-cost GPS receivers, integrated in most smartphones or other handheld GPS devices. With those devices the coordinates of a location can easily be determined. The acquired coordinates or GPS traces may then be used, for example, to digitize the outline of a street traveled or to mark points of interest at the measured location. Images may also be georeferenced by adding coordinates. Another way of gathering information is digitizing features from satellite imagery. (cf. Goodchild 2007)

Sester et al. (2014) provide an overview of characteristics of VGI. Volunteered Geographic Information can be highly heterogeneous in terms of the quality and coverage. Depending on the number of volunteers, some regions may be more complete than others. Figure 6 shows the density map depicting the nodes available in OpenStreetMap, the most famous VGI-project. The brighter the color is, the more nodes within that pixel. It can be seen, that developed regions with a high population density like Europe and North America have more nodes than others. This may be attributes to the fact that there are more people living in these places who are potential contributors, while also considering that there may be more features to digitize. If there are more volunteers in a region, the data is also more likely to be up-to-date. Especially in comparison to authoritative data which is usually updated in certain cycles, VGI is updated whenever a volunteer detects a change in the real world. Another characteristic of VGI is the heterogeneity with respect to semantic information. In particular, when collecting topographic data in OpenStreetMap, there is no standardized catalogue of features and their semantic information. The semantic information is 11

2 Background added as key-value pairs, which are commonly discussed in the community, however, in practice a user does not need to follow these agreements.

Figure 6: Density map of OpenStreetMap nodes (© OpenStreetMap wiki-user ‘Tyr’5)

2.2.2 Classification and Examples There are plenty of projects which somehow deal with geographic information. Jokar Arsanjani (2014) categorized the projects according to the purpose or type of data (topographic, images, video, text) which is being shared. The categories are listed in Table 1.

World mapping projects Weather mapping Business mapping Social media mapping Crisis and disaster mapping Transportation mapping Environmental and ecological Outdoor activity mapping Crime mapping and tracking monitoring

Table 1: Categories of VGI project according to Jokar Arsanjani (2014)

Hahmann (2014) extends this list with ‘encyclopedic projects’ such as Wikipedia6. For this thesis, the most popular world mapping project, OpenStreetMap7, is of relevance as a data source for street network and GPS traces. OpenStreetMap is about creating a from volunteers under an open license. Here the contributors generate map features by digitizing recorded GPS traces or satellite imagery. Next to digitized topographic data, raw GPS traces are also collected within this project. In section 2.3 this project will be explained in more detail. Other

5 Source of image: http://wiki.openstreetmap.org/wiki/File:OSM-node-density-map-2013.png, checked on 15/07/2015 6 http://wikipedia.org 7 http://openstreetmap.org 12

2 Background

‘world mapping projects’ are, among others, Wikimapia, which similar to OpenStreetMap, aims to mark geographical objects, and 8 , which is operated by Google and was initiated to improve the quality of Google Maps9.

Furthermore, ‘outdoor activity mapping’ projects, such as 10 or GPSIES 11 shall be mentioned. These projects aim to collect GPS traces of outdoor activities undertaken by the contributors. The purpose is to provide outdoor routes including additional information, such as points of interest, distance or elevation profile. In addition, traces may be rated and can therefore be used to search good outdoor routes, depending on the intention of the user. Those projects are a potential data source of GPS traces for the derivation of incline values.

2.3 OpenStreetMap

This chapter briefly introduces the VGI-project OpenStreetMap, since the data of this project is used for this research. Firstly, the project will be introduced in general and a short history will be given. Secondly, how the geographic and semantic information is handled within this project will be explained, followed by an assessment of how incline can be represented and how often it is actually mapped.

2.3.1 Introduction to Project The OpenStreetMap (OSM) project was founded by at University College London in 2004 and aims to create a freely and globally available map. Information like map features including their semantic information are added and modified by the community. The way of contributing data is typical for a VGI-project. In the first years of the project, data was exclusively contributed by capturing the travelled path with a GPS-device, followed by the digitization of the recorded route on a computer. For editing the , several editors are available, such as JOSM or iD. These editors make it very convenient to load the GPS raw data and create geometries. Furthermore, the editors handle the upload of the created features to the OSM which follows. In addition to the vector geometry of the map feature, the GPS raw data can also be uploaded (Ramm & Topf 2010, pp. 3 f.). In 2007, the company Yahoo! allowed OSM-contributors to use their aerial imagery for the digitization of map features. With the satellite imagery, it became very easy to create features such as buildings, which are hard to measure using a handheld GPS device. This also enables contributors to create features remotely, without being on-site or having any local knowledge about the region (Haklay & Weber 2008). Three years later, in 2010, also provided their aerial imagery for the purpose of contributing to OpenStreetMap

8 https://google.com/mapmaker 9 http://google.com/maps 10 http://www.wikiloc.com/ 11 http://www.gpsies.com/ 13

2 Background

(OpenStreetMap Wiki 2015a). Other sources of data include donations from public agencies and the integration of other open data. An example is the integration of the entire street network of the , after the donation of the company AND (Automotive Navigation Data) in 2007.

The data, stored in the OSM database is licensed under the Open Database License (ODbL). It allows everybody to share the data, produce their own work from it and redistribute it, as long as the new database is also published under ODbL and OSM and its contributors are attributed. (Open Knowledge Foundation 2015). The OSM database has not always been under ODbL. From the beginning of the project until September 2012, the data was licensed under the terms of the – Attribution-ShareAlike (CC-BY-SA) license. CC-BY-SA was made for creative works, such as music and pictures. Therefore, it could hardly be used for collections of data or such as OpenStreetMap, since the mentioned terms are hard to interpret for databases. Furthermore, it was not possible to mix data under CC-BY-SA with data under other licenses. This was made possible with the change to ODbL. (OpenStreetMap Foundation Wiki 2015b) The process of changing the license was initiated by the OpenStreetMap Foundation. It is a non-profit organization, founded in Great Britain, to support the OSM project with organizational tasks. Among other things, the foundation hosts the OpenStreetMap servers, helps with collecting donations for servers and supports the community with organizing events, like so-called mapping parties or conferences. The OpenStreetMap Foundation also organizes working groups which act as support in specific fields or topics. An Example is the Operations Working Group, which is responsible for issues related to servers and the OSM API. (cf. OpenStreetMap Foundation Wiki 2015a, 2015c)

Over the years from the beginning of the project in 2004 to now, the OSM project has become more and more popular. Haklay & Weber (2008) say that “[…] OpenStreetMap (OSM) is probably the most extensive and effective project currently under development”. There are now over 1.9 million users registered, which contributed over 2.75 billion nodes and over 250 million line objects. Worldwide, the users uploaded GPS traces providing approximately 4.5 billion track points (OpenStreetMap Wiki 2015c).

As is typical for a VGI project the quality and completeness of such data may vary. Neis et al. (2012) evaluated the OSM street network of Germany from 2007 to 2011 using a dataset of a commercial provider. If only taking streets into account, which can be used for navigation (name or route number of streets is known), the street network of OSM is 9 % smaller than the data from the commercial provider. But if the entire street network is considered, the OSM dataset is 27 % larger or even 31 % larger, if paths for pedestrians are considered. This means there are streets or paths in OSM which do not exist in the commercial data set. The reason for this can be found in the fact that OSM contains small hiking trails, paths or lanes, which are not 14

2 Background relevant for the commercial map provider. Since this evaluation was made at the time of writing four years ago, it can be expected that the completeness of the street network has now improved even further. This shows the potential of the OpenStreetMap dataset and proves its suitability for many applications. According to Neis & Zielstra (2014b) it has been proven in the past that the OSM data can be used for various applications such as crisis management, mapping for different purposes (hiking, public transport) or routing. There are several routing services for different purposes, such as komoot12 for cycling and hiking or Skobbler13 for car navigation.

2.3.2 Data Model As mentioned in section 2.3.1, the OpenStreetMap project stores both the digitized map features and the GPS traces as raw data. While the map features are underlying a data model, the GPS traces are stored in a file system as GPX-files14 (cf. Ramm & Topf 2010, pp. 317–318). The data model of the map features follows a simple approach. Figure 7 shows the GPX file system next to the object types available in OpenStreetMap and their relations to each other. A node is a representation of a point on the earth’s surface, described by and latitude. Ways are the representation of linear features. Instead of being defined by a sequence of coordinates, a way object references up to 2000, but at least two, ordered nodes. Since the referenced nodes are ordered, the way is directed from the first to the last node. If a line is closed (starting point is equal to the end point) the way can, but must not necessarily be considered a polygon. The third object type is a relation. Several nodes, ways and/or other relations can be referenced by a relation object. This is done when different objects are somehow related to each other. This is the case, for example, when a number of ways define the route of a bus within a city. Semantic information is added to all objects by defining a certain number of tags. A tag is a key-value-pair, separated by ‘=’. A tree, for example, would have the tag ‘natural=tree’. Tags may have any combination of key and value, however, for consistency reasons, the community agreed on a list of tags15, which should be used for mapping. This concept of adding semantic information to a geometry object has the advantage that new tags can always be introduced, if required for special use-cases (cf. Ramm & Topf 2010, pp. 55-59).

12 http://komoot.de, checked on 15/07/2015 13 http://skobbler.de, checked on 15/07/2015 14 GPX is the data format, based on XML, for exchanging GPS-traces. 15 http://wiki.openstreetmap.org/wiki/Map_Features, checked on 15/07/2015 15

2 Background

Figure 7: OSM data model for map feature (left) and file system for GPX files (right) (adopted Ramm & Topf 2010, p. 56)

2.3.3 Incline Information in OpenStreetMap The information about the incline can easily be added to ways as semantic information. According to the list of proposed OSM map features as mentioned in 2.3.2, the key ‘incline’ should be used when adding this information to a street or a path. The corresponding value of the tag is the actual incline value, given in percent or degrees. Since, there are two possible units it must be indicated with ° or %16. Positive or negative values indicate if the way is inclined up- or downwards, depending on the direction of the way. When the inclined part of a street does not cover the entire street segment, the street should be split at the start and end of the inclined part. Furthermore, it is recommended that the steepest incline along the path shall be added as a value. If the exact incline value is unknown, but it is visible that the street is inclined, the value of the key ‘incline’ can also be ‘up’ or ‘down’ (cf. OpenStreetMap Wiki 2015e)

share of OSM ‘highway’ features value of key ‘incline’ with incline information

‘up’ 44.3 % ‘down’ 30.8 % others 24.9 %

Table 2: Usage of the key 'incline' and its values17

16 Examples are: ‘incline=6%‘, ‘incline=8°’, ‘incline=up’, ‘incline=down’ 17 source: https://taginfo.openstreetmap.org, checked on 15/07/2015 16

2 Background

Out of over 83 billion (83,299,544) OSM features tagged with ‘highway’, only 0.2 % (169 121) have information about the incline. This also includes also all paths, such as footpaths or bicycle lanes. From Table 2 it can be seen that out of the 0.2 %, the main part (~ 75 %) has the value ‘up’ resp. ‘down’, giving only information that the path is inclined, but not to what extent. For the other 25 %, the incline is mainly more specifically defined in percent or degree, however, a few are also described with words such as ‘moderate’ or ‘extremely steep’. To summarize, it can be said that there is hardly any information present about the incline of paths in OSM, and if so, the infor- mation is not very specific. This has several possible reasons. On the one hand, it may be difficult to attract the contributor’s attention to a tag which will not be displayed on the map. On the other hand, the incline cannot be digitized from GPS traces or aerial imagery, like other features. It somehow needs to be measured or estimated with the use of special tools like measuring tape, inclinometer or the smartphone with an in-built gyroscope. Measuring the incline is therefore time- consuming and the contributors have to be on-site, since the incline cannot be mapped using simple methods.

2.4 Data Mining

The aim of this thesis is to gain or extract information out of a vast amount of data. Such a process is generally referred to as ‘Data Mining’. Next to data mining, the term knowledge discovery from data is used in academic literature. While sometimes both terms are considered synonymous, data mining can also be seen as one step in the process of knowledge discovery of data (KDD), as in Fayyad et al. (1996) . He describes the steps of the process of KDD as data selection, prepro- cessing, transformation, data mining and interpretation and evaluation of the results. Data mining in this process chain of KDD refers to “[…] applying data analysis and discovery algorithms that produce a particular enumeration of patterns […] over the data”. Therefore, KDD is an iterative process in which any two steps can also involve iterations. According to Han & Kamber (2006) many fields are considering the terms data mining and KDD as synonyms, probably because data mining is much shorter. Hence, he defines data mining as follows:

“Data mining is the process of discovering interesting patterns and knowledge from large amounts of data. The data sources can include databases, data warehouses, the web, other information repositories, or data that are streamed into the system dynamically.”

Consequently, for this thesis both terms are used synonymously as the entire process of gaining knowledge, including all steps as mentioned in Fayyad et al. (1996).

17

2 Background

Data mining can be applied to any kind of data, such as information about books in a library, data about customers (personal data or transactions), search engine queries or user-generated content of different online communities such as , Instagram or OpenStreetMap (cf. section 2.2). The field of data mining has grown out of the need to handle the data, after devices and methods were developed to capture and store data of this amount. It comprises different methods and techniques, such as detection of patterns and acquiring knowledge about the association and correlation of a data collection. A collection of data is therefore always needed, since such information cannot be obtained from a single record (cf. Han & Kamber 2006, pp. 5-7).

For spatial or geographic data special techniques and methods were developed and the field of spatial data mining has emerged. Shekhar et al. (2004) argues that spatial data is not compatible with regular data mining techniques, due to the complexity and intrinsic spatial relationships. Spatial data mining uses techniques and methods from the field of spatial analysis as well as the field of general data mining, as mentioned in the paragraph above. Mennis & Guo (2009) review commonly used methods and techniques. The spatial classification can be divided into supervised and unsupervised classification. Different objects are grouped into classes based on its properties. Contrary to the unsupervised classification, which is also known as clustering, the supervised classification needs a training dataset to detect the members of a group. An unsupervised method is spatial clustering, where points are classified according to their spatial location. Spatial classifica- tion methods generally consider neighboring objects, while this is not undertaken in general classification methods. Another method, commonly used in spatial data mining, is the point pattern analysis. It is also a clustering method and tries to extract areas in which an unusual amount of events occur. An example is the detection of streets where accidents occur more often than on other streets. This method is also known as Hot Spot Analysis. Further information on clustering methods can be found Mennis & Guo (2009).

18

3 Related Work

3 Related Work The chapter describes what applications are in need of incline information and which research was done with regard to mining information out of user-generated GPS traces. Furthermore, research related to the extraction of 3D information out of GPS data will be reviewed. For mining street information out of GPS data it is essential to know, on which street the traces were recorded. This process is formally known as Map Matching and, after a short review of different types of algorithms, two of them are explained in more detail. At the end of this chapter, different methods for smoothing time series measurements are reviewed. This will be an essential step in prepro- cessing the GPS data.

3.1 3D Routing

There are several routing applications which rely on elevation information such as routing for sport activities, wheelchair routing or energy-efficient routing for electric-powered vehicles (e.g. E-cars, Pedelecs18 or electric wheelchairs). In the following section, different projects related to this topic are presented. For projects, relying on VGI, a common problem is the lack of information regarding the elevation or incline.

3.1.1 Wheelchair routing Compared to navigation systems for cars, the routing for mobility-restricted people, such as wheelchair users, elderly people with push chairs or temporarily impaired people, is more complex. People belonging to one of these user groups may all have slightly different requirements for a route. This highly depends on the individual disability and the type of assistive equipment (pushchair, manual wheelchair, electric wheelchair). Ding et al. (2007) studied the requirements for a wheelchair navigation system, through an empirical study with physically impaired people ans their assistants. Among other attributes like condition of the sidewalk or information about stairs and ramps, the street incline is of high relevance for wheelchair routing. Furthermore, Menkens et al. (2011) performed investigation in the needs of wheelchair users, regarding a navigation system which meets their requirements. In terms of incline, they found out that the maximum incline which can be passed with a manual wheelchair is in general between 3% and 8% and for electric wheelchairs up to 10%.

There are many investigations dealing with the development of routing algorithms meeting the needs of mobility-restricted people (e.g. Müller et al. 2010; Neis & Zielstra 2014a). The main problem that exists is the lack of data regarding sidewalk information, surface of sidewalk, curbs

18 Acronym for ‘Pedal Electric Cycle’. Pedelecs are with an assisting electric engine. The engine supports the driver while pedaling up to a speed of 25 km/h (according to German Road Traffic Licensing Act (StVZO)). 19

3 Related Work and also inclines. Although the data was already acquired by governmental authorities or commer- cial map providers, it is very costly. Therefore, most of the approaches rely on volunteered geographic information, for example OpenStreetMap. In OpenStreetMap, that information is theoretically freely available, but unfortunately hardly existing in the dataset. As stated in section 2.3.3 incline values are only available for 0.2 % of the street segments. Approximately 75 % of them contain only the values ‘up’ and ‘down’. This only indicates if a street or path is inclined, but does not specify the value. Due to the different requirements of the people, this is not sufficient and a more accurate knowledge of incline is required.

Besides incline information, there is also a lack of other accessibility-related information in VGI. Therefore, many routing services were developed, which allow the user collecting those infor- mation (Kurihara et al. 2004; Menkens et al. 2011; Völkel & Weber 2008; Harriehausen- Mühlbauer 2014). The idea is that the users gather information about barriers or obstacles on sidewalks, while getting navigated. This incrementally improves the quality of the route calculation and also ensures that temporary barriers are acquired. The incline of streets needs to be measured and can consequently not be determined with those systems. This shows the demand for an alternative way to determine incline values of a street network.

3.1.2 Energy-efficient routing Electric powered vehicles, such as E-cars, Pedelecs or e-wheelchairs are getting more and more popular, although according to Bachofer (2011) people are still skeptical. This can be explained with high costs, long time for charging the battery or shorter distance range. In addition, a reason may also be the poor prediction of distance range. Depending on the properties of the street, the power consumption may vary. The surface material as well as the incline of the street decreases the battery service life. Depending on the speed, the energy demand increases with 50% to 100% on an incline of 4% (Bachofer 2011). Although the travel distance is longer, it might be of benefit if the user takes a route around a hill or avoids streets with bad surface. Consequently, the knowledge of the incline is an important factor to estimate the distance range per battery life. With a routing service that considers the energy consumption of a street segment, the energy demand of a route can be determined. This allows us the possibility to choose the most energy-efficient route or at least gives a prediction of the battery’s distance range. This research field is known as EcoRouting or Green Navigation (Bachofer 2011). Although, it is not that relevant for fuel powered vehicles, since they have a bigger distance range and can use a denser network of gas stations, with EcoRouting the fuel consumption and therewith the carbon dioxide emission can be reduced.

Franke et al. (2012) developed an algorithm for energy-efficient routing of electrically powered vehicles. The resulting navigation system is called eNav. Firstly, it calculates the power consump- tion for each edge of the routing network, using the length of the edge and incline information. 20

3 Related Work

Secondly, edges which cannot be passed by wheelchairs (e.g. because of steps) are rejected and accessibility information about edges and Points of Interest (POIs) are requested from other platforms like rollstuhlrouting.de19 or Wheelmap20 need to be requested to get. In the third step, surface information is included in the routing algorithm. OpenStreetMap has been taken as data source for the street network and for information about the street surface. For the calculation of the incline the authors used airborne laser scanning data, which is of high accuracy but also very expensive.

Sachenbacher et al. (2011) and Kono et al. (2008) also investigated the topic of energy-efficient routing. The motivation of Sachenbacher et al. (2011) can be found in the field of electric mobility and they developed an algorithm for energy-efficient routing using OpenStreetMap street data and the SRTM DEM with a horizontal resolution of 90 m. Contrary to the aforementioned investiga- tions, Kono et al. (2008) tried to minimize the fuel consumption of conventional cars, by develop- ing an eco-friendly routing algorithm. To do so, they consider traffic information, geographic information and even vehicle parameters. As elevation data they use a DEM with a horizontal resolution, provided by the Geospatial Information Authority of (GSI).

A freely available alternative to GPS traces for the derivation of incline value is SRTM. Bachofer (2011) analyzed the influence of the accuracy of DEM onto energy-related routing. He integrated different DEMs into a routing system and found out that the accuracy of the DEM does influence the modelled energy demand only to a minor degree. Consequently, he concluded that SRTM data is sufficient for this use-case, however, it has also been discovered that for some routes the modeled energy demand was more than 30% wrong.

3.2 Extraction of Street Attributes from user-generated Movement Trajec- tories

Mining street information out of user-generated GPS traces has already been investigated by several researchers. However, the focus was in deriving 2D information only and to the best of my knowledge no literature was found, where the elevation of user-generated GPS traces was used to derive 3D information. As shown in the following section 3.3, high-accuracy GPS measurement techniques have been used to derive elevation related information.

Van Winden (2014) proposed algorithms to automatically derive different road attributes, like the direction of the road (one or two way), speed limit or number of lanes. As GPS input data he used GPS traces acquired from 800 people during a certain time span. Therefore, the transportation

19 http://rollstuhlrouting.de, checked on 15/07/2015 20 http://wheelmap.org, checked on 15/07/2015 21

3 Related Work mode was known. The input data of the street network to be updated was taken from Open- StreetMap. Like in typical data mining processes (cf. section 2.4), the data needed to be prepro- cessed. The GPS traces had to be semantically linked to a street (map matching) on which the trace was recorded. For this step the algorithm by Marchal et al. (2005) was used by requesting an application programming interface (API).

Map matching was also an essential step in the research of Zhang et al. (2010). They used GPS traces collected by the contributors of the OpenStreetMap project and aimed to derive street attributes like the number of lanes and turning-restrictions. Furthermore, they used the traces to automatically correct the street centerline from the street network when this is geometrically incorrect. For this purpose, a map matching algorithm was implemented which is described in detail in section 3.4.2. Additionally, they did an analysis of the coverage of the GPS traces, which gives a first idea of what can be expected from this research. In their test area they discovered that highways have 30 to 80 GPS traces whereas city roads have less than 20. Secondary roads in a neighborhood have only a few or even none GPS traces. With a high redundancy, better results can be achieved.

3.3 Derivation of 3D information, using high-accurate GPS measurements

In this section, research is present which involves the derivation of 3D information from GPS, collected using high-accuracy GPS measurements. Due to the relative high accuracy, the redundan- cy is not as crucial as in the work presented in section 3.2. To achieve a higher positioning accuracy different methods have been used.

Boucher (2013) used SBAS-GPS receivers to estimate the height of a street network. SBAS21 is a geostationary satellite augmentation system to support GPS. It sends correction data and improves therefore the accuracy of GPS from 10m to 2m. The system which covers Europe is called ‘European Geostationary Navigation Overlay Service‘ (EGNOS)22. The collected GPS traces were fused with OSM street network data and the SRTM-3 DEM. The proposed method relies on GPS measurements, acquired under good conditions. The roof of a car was equipped with two SBAS- GPS antennas and the car was only driving roads with open environment. Therefore, error sources which are common in crowdsourced GPS-trajectories like obstruction through buildings or multipath effects are mainly eliminated in this data. To resolve the remaining error, the SRTM-3 DEM is used to correct the discrete height of the GPS-measurements. Matching the GPS traces and the road network was done using a statistical method, which makes use of the Mahalanobis distance. The 3D road network was derived by fusing the three data sources sequentially using

21 http://en.wikipedia.org/wiki/GNSS_augmentation, checked on 15/07/2015 22 http://www.essp-sas.eu/introducing_egnos, checked on 15/07/2015 22

3 Related Work

Kalman filter techniques. According to an experimental validation the road elevation estimation could be improved using GPS trajectories in addition to the SRTM-3 DEM.

The following work achieved even higher accuracies, by using differential GPS with temporal base stations (cf. section 2.1.1). Han & Rizos (1999) were motivated by Solar Challenge, a special race for solar-powered cars across the Australian continent. The objective was to determine the height profile of the road in order to optimize the race strategy. A car equipped with a differen- tial GPS device drove from Darwin to Adelaide, convoyed by two cars acting as reference stations. The road was divided in sections and for each section the reference stations were parked at the beginning and the end. To derive the height information a spatial Kalman-filtering technique was used to predict the incline information.

3.4 Map Matching

In the field of navigation it is important to know on which street the carrier of a GPS device is traveling. Furthermore, the position on that street segment is of importance. To solve this problem, the recorded trajectory data of the moving object and the segments of the street network data need to be semantically linked (cf. Figure 8). These algorithms are in the literature referred to as Map Matching (e.g. Quddus et al. 2007; Marchal et al. 2005). One may think that this is a straightfor- ward task, but due to inaccuracies of both input data sources it is more complicated. Especially in regions where the GPS signal is generally of low quality (e.g. urban areas, forest) and a dense street network exists, the quality of map matching may vary. Furthermore, errors and inaccuracies in the street network data may cause a wrong match of the trajectory data and the street network.

Figure 8: Map Matching. The GPS trace (blue) is snapped to the street network (red) (Map: OSM)

23

3 Related Work

3.4.1 Categorization of Map Matching Algorithms Quddus et al. (2007) reviewed different map matching algorithms. They categorized the algorithms in four different groups of approaches. The first group is about geometric approaches. They exclusively rely on the geometry of the trajectory and street network data and do not consider the topology. This means that the connectivity of street segments is not used in the matching process. Geometric algorithms take the geometry of either the single point positions or the trajectory as curve and search the closest node within the street network or the closest curve. Consequently, the approaches are called point-to-point, point-to-curve or curve-to-curve matching. The geometric approaches are generally faster in the processing and easy to implement. Secondly, there is the group of topological approaches. In addition to the geometry of trajectory and street network data they make use of the relationship between the segments of the street network. Two street segments may for example be connected or disjoint. For the presented algorithms, the topology of the street network was analyzed in advance. The third group is the group of the probabilistic map-matching approaches. For those approaches, the error of the GPS measurement is taken into account in the form of an error ellipse. Using the error ellipse it is searched for intersecting street segments, which are considered as matching candidates. In case, there is more than one candidate, properties like speed or direction of the trajectory are used to detect the correct street segment. Advanced map matching algorithms represent the fourth group. Algorithms are described which uses more advanced techniques, such as Kalman filtering or other mathematical models. For all algorithms the assumption is made, the GPS trace was recorded while travelling along a street, rather through areas where no street can be found.

In the following section three Map Matching algorithms are described briefly. The first two ( Marchal et al. (2005) and Karussel (2014)) are already implemented and ready to use. Both algorithms are designed for post-processing applications only and may potentially be used in this research. The third algorithm proposed by Zhang et al. (2010) is not yet implemented, however, it is easy to do.

3.4.2 Functionality of Selected Algorithms The algorithm proposed by Marchal et al. (2005) is a topological algorithm, as it uses information about the connectivity of the street segments. As already mentioned, this map matching algorithm is implemented in the online service called Trackmatching23. The service provides an API, which can be requested with a set of GPS points as input. As a response, the user gets a set of IDs, referencing the traveled street segments from OpenStreetMap. A disadvantage of this algorithm is that it is limited to the street network and GPS traces cannot be matched to sidewalks or bicycle lanes. This makes it unsuitable for applications, where the transportation mode can also be cycling

23 https://mapmatching.3scale.net/, checked on 15/07/2015 24

3 Related Work or walking. In short, the algorithm works as follows: The incoming GPS points are processed sequentially and matched according to their distance to a street segment, which is connected with the previous matched segment. The first step is the initialization process. Starting with the first GPS point, the three closest street segments are searched by calculating the Euclidean distance. Using these segments, new candidate paths are created. Each candidate path has a score (weight), which is the sum of the distances between the GPS points and the segment. The path with the smallest score and the smallest cumulative distance is considered as the traveled route. If the end of a street segment is reached, the algorithm searches for street segments, which touch the end node. All touching segments are now considered as matching candidates. For all candidates the cumula- tive distance to the GPS points is calculated. Again, the segments with the lowest score is consid- ered as the traveled path.

Another approach is the algorithm by Karussel (2014). According to the categorization by Quddus et al. (2007) it may be categorized in the group of advanced algorithms, as it uses a routing engine to estimate a path, rather than using the street network as input. The algorithm is implemented in Java and is published 24 under the Apache License 2.0 and can therefore be used freely. The algorithm is uses the routing engine Graphhopper25, a route planner which is based on OSM. Firstly, for each GPS point the three closest street segments are searched and weighted. The weight of an edge is the shortest distance to the GPS point. Once each GPS point has three weighted street segments, the routing engine is requested to find the best path along all the selected street seg- ments. The best path is the one, where the sum of all weights is the smallest. This makes this algorithm unsuitable for real-time applications. The advantage of this approach is that a realistic path is found, even if the GPS trace is interrupted (e.g. in tunnels or in dense forests). However, to calculate a path, Graphhopper needs to know whether the track was recorded while walking, cycling or driving a car. This is a disadvantage for applications where the transportation mode is not known. Another disadvantage is that the results are only routes, computed by the route planner. This probably leads to mismatches when a street was taken although it is not allowed (e.g. a pedestrian walking in opposite direction on a one way street).

The third algorithm is proposed by Zhang et al. (2010). It is mainly a geometric algorithm, but also uses a clustering method, therefore, it may also be categorized as an advanced method. Like the aforementioned two algorithms, the OSM street network is used as input data. Within the algo- rithm, three conditions are checked using the street segment and the GPS traces: distance, direction and angle. Note that this method uses the GPS traces as curves, rather than processing the GPS points sequentially. First of all, profile lines perpendicular to the street are created with a specific distance to each other. The length of the lines is 30 m which have been found to be a reasonable

24 https://github.com/graphhopper/map-matching, checked on 15/07/2015 25 https://graphhopper.com/, checked on 15/07/2015 25

3 Related Work value considering the error of the GPS traces and the width of the street. All traces which intersect the perpendicular lines are firstly seen as matching candidates. In case of a one-way road, a candidate is removed from the list if the GPS trace has the opposite direction than the one-way road. By convention, the direction of one way roads in OSM is specified through the digitization direction. The third condition is the angle between the trace and the road. If this angle is greater than 20 degrees, the GPS trace will also be removed from the candidate’s list. The three conditions will select the corresponding traces, but it can still yield mismatches (false positives), if two two- way roads are parallel and too close to each other. This if often the case, where there are street- accompanying bicycle lanes and sidewalks. In this case, the GPS traces are matched using a clustering method.

3.5 Smoothing of Time Series Measurements

The input data for this research are GPS traces collected by user with low-cost GPS devices. As stated in section 2.1, a certain noise is expected in the data. According to Haining (2003), smooth- ing algorithms can be used to remove the noise and improve the accuracy of the derived infor- mation. In his book, he reviews different smoothing methods and techniques. Non-linear smoothing methods such as median smoothing and linear smoothers like mean smoothers are mentioned. Linear smoothers shall be chosen if there are no abrupt changes expected in the data. This is due to the fact that peaks or small-scale features will be removed by averaging values instead of taking the median. The elevation profile of a street on which the GPS traces have been recorded usually follows a continuous line. Consequently, the elevation profile of the GPS measurements should theoretically not contain any discontinuous measurements and if so, they can be considered as outliers and shall be flattened.

For linear or non-linear smoother, a window of certain size is fixed with its center on each data point of the series. The data point on which the window is fixed will be assigned with a smoothed value, considering all neighboring data points within the specified window. Out of the data points, a weighted average or the median can be used as the new value. When determining the weighted average, the weights can be selected with regard to the distance between the data points, however, they are normalized to one (Haining 2003, p. 231). Points further away consequently influence the smoothed value less than closer points. The size of the windows and the number of data points, which influence the result, should be chosen depending on the desired degree of smoothing. A bigger windows size increases the information used of points, which are further away. This leads to a higher precision, although it can also lead to biases. A smaller window size decreases the risk of introducing a bias, but the precision is lower, because a smaller sample and consequently less information are used. (cf. Haining 2003, p. 229)

26

3 Related Work

In the following, Tukey's (1977) ‘Median of 3’ algorithm is explained in more detail. Although it is a non-linear smoothing algorithm and thus not likely to be used in this research, the functionality is very similar to a weighted moving average smoother.

Tukey (1977, pp. 210-213) presents an algorithm for smoothing equidistant sequences of numbers, namely ‘Medians of 3’. To start with the smoothing, the second number as well as the previous and following number are selected. It has to be started with the second number, since the first does not have a left-hand side neighbor. This yields to no value for the first and respectively last spot of the sequence. From the selected three numbers, the second will get a new (smoothed) value assigned considering the two neighbors. After ordering the selected numbers, the median can be determined easily. The determined median is then assigned to the second value. This process is then repeated with all values of the sequence. It is important that the neighbor on the left-hand side is selected from the raw data, rather than from the smoothed data. The smoothing can then be applied to the smoothed sequence again, to achieve a higher degree of smoothness. Figure 9 shows a raw sequence of numbers (row 1) as well as a single (row 2) and repeated (row 3) smoothed data row.

13 7 9 3 4 11 12 1304 10 15 12 13 17 20 24 - 9 7 4 4 11 12 12 15 12 13 13 17 20 - - - 7 4 4 11 12 12 12 13 13 13 17 - -

Figure 9: Example of 'Median of 3' -smoothing with the raw data (row 1) and the results using the single median smoothing and the repeated meadian smoothing. (Tukey 1977, p. 212)

This approach problematic as the sequence gets shorter because of the missing values at the end. To solve this problem Tukey (1977) proposed two approaches. The first and simplest one is to just copy the first value from the raw data to the smoothed values. The second and more complicated one is the so called ‘end-value smoothing’. The last and missing value is the median of the following values derived only from the first three values of the sequence. The corresponding number from the above mentioned example in Figure 9 is shown in parenthesis.

Value 1: the actual raw value (13) Value 2: second number from the first smooth (9) Value 3: The second value of the first smooth (9) plus two time the difference of the second and the third value of the first smooth (9,7): 9 + 2* (9-7) =13

Consequently, the first value of the smoothed sequence will be 13. An illustration for better understanding is shown in Tukey (1977, p. 222).

27

4 Methodology

4 Methodology The Methodology describes all steps which are necessary for the derivation of incline values of the street network. First of all, the pilot region in which the approach is tested will be defined. Secondly, the used tools and data will be described in detail. It follows a detailed description of the workflow and implementation, including data import, preprocessing, map matching and the calculation of the incline. In the last section of this this chapter it is described, how the derived information can be validated.

4.1 Definition of Pilot Region

The pilot region for this research is the region around Heidelberg in the south-west of Germany. In Figure 10 the extent of the region it indicated by the red bounding box. The region was chosen since is it also one of the pilot regions for the project CAP4Access. Projected, the area is almost a square with a side length of approximately 22 km. This results in an area of approximately 497 km². The area is bounded by the localities Leutershausen in the north, Neckarsteinach in the east, Nussloch in the south and in the west it reaches almost to Mannheim. The region is character- ized by mountainous and forested areas in the east as well as flat urban areas and farmland in the west. This is particularly suited for this research, since it makes it possible to differentiate the results between land use classes and other characteristics.

Figure 10: Pilot region Heidelberg / Germany. (Map: OSM) 28

4 Methodology

4.2 Tools

In order to work on this topic and implement the steps of the workflow, a range of tools were used. As programming languages Java has been used to implement the main part of the software. The extensive number of spatial algorithms implemented within the Java libraries GeoTools26 and Java Topology Suite (JTS)27 made it convenient and efficient to work with geometries. Both libraries are licensed under the Open-Source license LGPL and can easily be integrated in the project using Maven dependency management28. Maven is a tool which helps to build java projects and manages the integrated libraries. As an integrated development environment (IDE), Eclipse has been used. An IDE supports programmers with a source code editor, syntax highlighting, recommendations and a useful debugging tool.

To store the input data as well as the results, a PostgreSQL (PGSQL) database has been used. In order to work with spatial data the extension PostGIS has been added to the installation of PGSQL. It also provides many functions which implement geometric algorithms, however, the processing was mainly done in Java.

For visualization of intermediate results and some preprocessing tasks the GIS-tools ArcGIS and one of its Open-Source alternatives QGIS has been used. QGIS was used in addition to ArcGIS, since connecting to the database and loading the data is very easy and intuitively.

4.3 Data

The main two data sources for this research are the crowdsourced GPS traces and the street network. Both data sources are described in detail in the following sections. Furthermore, data is described, such as land use classes and digital elevation models (DEMs), which is not needed for the calculation of the incline, but used for the evaluation of the result.

4.3.1 Crowdsourced GPS traces Crowdsourced, or user-generated GPS traces, are one of the data sources for the determination of street incline. Firstly, this section gives an overview of different platforms, projects or application in which GPS traces are crowdsourced by volunteers. Secondly, the format for exchanging GPS traces, GPX, is introduced. After that, it is described how GPS traces are handled within the OpenStreetMap project, since the GPS traces of OSM will be used for this research. At the end, typical errors within the data are discussed briefly.

26 http://www.geotools.org, checked on 15/07/2015 27 http://www.vividsolutions.com/jts/JTSHome.htm/, checked on 2015/05/22 28 http://maven.apache.org/, checked on 22/05/2015 29

4 Methodology

4.3.1.1 Platforms and Devices There are several platforms and applications in which GPS traces are collected for different purposes. One example are sport-tracking apps for smartphones, such as Strava29, Runtastic30 or Runkeeper31 which track the user’s way while training, to provide statistics about the activity such as distance, average speed, total climb or the elevation profile. Other examples are platforms such as gpsies.com 32 which purpose is to exchange and recommend traveled routes for outdoor activities. The collection of GPS traces within the OpenStreetMap project has the purpose of supporting the map making.

The devices which are usually used to record GPS traces have integrated low-cost GPS receivers (Heipke 2010), such as smartphones or handheld GPS devices. Depending on the device or smartphone-app used, the elevation information of the track points may originate from a different source than GPS. Some devices, especially handheld GPS devices, have built-in barometers, which determine the elevation by measuring the change of air pressure. This can lead to high systematic errors, if the barometer is not calibrated properly. Another source for crowdsourced GPS traces are elevation databases. The sport-tracking services and Strava replace the measured elevation by the GPS receiver with values from an elevation database. Due to the poor vertical accuracy of GPS (cf. section 2.1.2), a plotted elevation graph or the calculated total climb may be wrong. While Strava uses an elevation database without mentioning the source of elevation information, Runkeeper uses the third-party service topocoding.com, which is based on elevation information from SRTM33,34. However, both services so not specify, if the measured elevation is only replaced for calculation and visualization or if the exported GPX files also contain the elevation values from the database rather than the original measurements. GPS traces, uploaded to the OpenStreetMap project, might be recorded from the aforementioned apps, which results in the problem, that the GPS traces may potentially contain elevation information, which is actually not measured by GPS, but taken from other sources. Depending on the device or smartphone applica- tion, the elevation must not necessarily reference the WGS 84 ellipsoid (cf. 2.1), but could also be referenced to the mean sea level.

29 https://www.strava.com/, checked on 23/06/2015 30 https://www.runtastic.com/, checked on 23/06/2015 31 http://runkeeper.com/, checked on 23/06/2015 32 http://gpsies.com/, checked on 23/06/2015 33 https://strava.zendesk.com/entries/20965883-Elevation-for-Your-Activity, checked on 22/05/2015 34 https://support.runkeeper.com/hc/en-us/articles/201109736-How-does-RunKeeper-calculate-elevation-and- climb- , checked on 22/05/2015 30

4 Methodology

4.3.1.2 The GPX Format For the exchange of GPS traces from the device to one of the aforementioned platforms, the GPX format is commonly used. GPX is the abbreviation of ‘GPS Exchange Format’ and is an XML- based format. Figure 11 shows an example GPX file, which is an instance of the GPX schema of version 1.135. The root element ‘gpx’ contains information about the version and the schema location as attributes. As child elements there are waypoints (‘wpt’) and tracks (‘trk’). Waypoints are points which have been stored separately in order to mark locations, such as point of interests. The track element contains the actual GPS trajectory. Within this work it is referred to as GPS trace or simply trace. A trace may contain several track segments (‘trkseg’) which again contain track points (‘trkpt’). The latter must at least be described by the attributes longitude and latitude. The coordinates are given in the geographic reference system WGS84. Additional optional information can be stored as child elements, such as timestamp or elevation. (cf. Ramm & Topf 2010, p. 26)

Castle

example gps track 194.31606 250.8594 . . .

Figure 11: Example GPX file.

35 http://www.topografix.com/gpx/1/1/gpx.xsd, checked on 23/06/2015 31

4 Methodology

4.3.1.3 OpenStreetMap GPS traces The main input data for this research are the GPS traces collected by OSM contributors. The so- called gpx-planet file36 contains all original GPS traces as GPX files from all over the world. The latest version of this file is dated from April 2013, however, the script to create the dump is online available37. It needs to be applied to the OpenStreetMap database, to which only OSM administra- tors have access. Therefore, a new dump cannot be created and the one from April 2013 has to be used.

The data was collected by thousands of users and contains more than 2.5 billion track points. As shown in Figure 12 , which depicts the amount of GPS track points per grid cell, the majority of points can be found in Europe. Especially in Germany, Austria and Switzerland a higher density than in other European countries, like Spain, can be observed. This may be due to the higher population density or a higher motivation in collaborating in such projects.

Figure 12: Screenshot of grid map, shown the number of GPS points per grid cell.38

There are also regional extracts of the GPX-planet file available39, which make processing of the dataset more convenient, if one is only interested in a certain region. Besides the gpx-planet file as data source, there is an Application Programming Interface (API)40, which allows the user to access the track points within a given region, using HTTP requests. On the of OSM, there is a

36 http://planet.openstreetmap.org/gps/, checked on 22/05/2015 37 https://github.com/iandees/planet-gpx-dump/, checked on 23/06/2015 38 Screenshot taken from http://resultmaps.neis-one.org/osmgps.html, checked on 22/05/2015 39 http://zverik.osm.rambler.ru/gps/files/extracts/index.html, checked on 22/05/2015 40 http://wiki.openstreetmap.org/wiki/API_v0.6#GPS_traces, checked on 22/05/2015 32

4 Methodology public list showing the uploaded traces. The upload of traces can be done manually using an online form41 or the API and HTTP POST. In order to supply additional information about the trace, a description must be given and a comma-separated list of keywords can be added. This makes it possible to search and find traces by keywords. The uploaded traces must also have assigned a visibility. Some users might not want to be linked to the uploaded traces, since one may draw conclusions about the user’s location and movement profile. Table 3 shows the four options ‘identifiable’, ‘public’, ‘trackable’ and ‘private’ and their explanations.

Visibility Description Identifiable - shown in the public traces list - points with timestamp served over the API - contained in planet-gpx file - link to trace page via API - Conclusion about the contributing user can be drawn. - Access to raw GPX-file possible via trace page Public - shown in the public trace list - points with timestamp served over the API - contained in planet-gpx file - not linked to trace page via API - access to raw GPX-file only via public trace list Trackable - not shown in the public trace list - points with timestamp served over the API - contained in planet-gpx file - not linked to the trace page - no access to raw GPS-file Private - not shown in the public trace list - points without timestamp served over the API - not contained in planet-gpx file - not linked to the trace page - no access to raw GPS-file

Table 3: Overview of visibility options for the upload of GPS traces42

The gpx-planet file has been imported and the public traces list has been requested to access the uploaded traces after August 2013 (which are not contained in the dump). In total, 4194 GPS traces from the gpx-planet file are within the pilot region. Out of it, 86% (3606) have elevation infor- mation and can therefore be used for this research. With additional traces from the public trace list, the number increases to 3842 traces. In total, there are over two million GPS track points in the area of the pilot region (~497 km²). Assuming that the pilot region was a square with a side length of 22,000 m and the points were evenly distributed, there is one GPS-point every 15 m x 15 m.

41 http://OpenStreetMap.org/traces, checked on 22/05/2015 42 source: http://wiki.openstreetmap.org/wiki/Visibility_of_GPS_traces, checked on 20/07/2015 33

4 Methodology

4.3.1.4 Typical Errors As already mentioned in 2.1.2, GPS measurements suffer from multiple errors and inaccuracies. Therefore, the elevation profile of a GPS trace always contains noise, meaning that neighboring points on a flat terrain will often have different elevation values. Figure 13 shows the elevation profile of a GPS trace on a fairly flat street. It can be seen that the elevation measurement always increases and decreases within a range of ± 2 m. An analysis regarding the elevation accuracy of crowdsourced GPS data is given in section 5.1.

Figure 13: Elevation profile of a GPS trace, recorded on a flat street.

Additional to the noise, it may also happen that there is a lack of GPS signal and the receiver loses the position fix. Then, the next point of the trace is the point, when there is again signal to the satellite. This may for example happen in tunnels or in other situation where no signal can be received. This phenomenon results in traces, in which long distances between two adjacent points can be found and which do not represent the course of the street. Figure 14 shows a few examples.

Figure 14: GPS traces with lost GPS-signals in tunnels. (Map: OSM)

34

4 Methodology

4.3.2 Street Network The street network will be enhanced with the calculated incline values. For this research, potential- ly every collection of street geometries may be used, however, OSM was chosen as the data source. OpenStreetMap data is open source and its data model is commonly known in the domain of volunteered geographic information. The streets are represented as LineStrings, or in terms of OSM, ways. Rather than the outline, the geometry specifies the centerline of the street. The streets are classified, using different tags. For the pilot area, 57.824 street elements with a total length of around 5336 km were extracted. Table 4 shows the value used in combination with the key ‘highway’, which were used to extract the streets from the OpenStreetMap dataset. It furthermore shows the share from the total length of each street type in percent. The street network is composed out of different types of streets and paths. This includes ways which are dedicated to cars, pedestrians, cyclists or a combination of the aforementioned. For convenience and to avoid confusion it has to be noted, that the individual parts of the street network will in this thesis be referred to as ‘street’, although it includes also paths, which cannot be used by car.

value Description43 share in % track agricultural, forestry streets 43.84 residential streets within residential areas 18.38 mainly hiking trails and small path paths 9.56 footway for pedestrians only 8.13 secondary country road of second priority 4.34 tertiary country road of third priority 2.84 cycleway for cyclists only 2.62 streets, where pedestrians have living_street priority over cars 2.01 motorway Equivalent to autobahn 1.98 roads with minor priority than unclassified tertiary 1.91 country road with highest primary priority 1.33 others44 3.06

Table 4: Values of highway tag and their share of length in percent.

The relatively high share of streets with the tag ‘highway=track’ can be explained with the high occurrence of forest and fields in the pilot area. The length of footways and bicycle lanes is compared to the residential road relatively small, although residential streets often have adjacent footways. Instead of mapping footways as an individual way, the information can also be added as tag to the street geometry. The same holds true for bicycle lanes. As already mentioned in section

43 http://wiki.openstreetmap.org/wiki/Map_Features#Highway, checked on 27/05/2015 44 trunk, motorway_link, pedestrian, trunk_link, secondary_link, primary_link, road, tertiary_link 35

4 Methodology

2.3.1, it is obvious that streets important for pedestrians or wheelchair users have a large share. This is another argument for using OpenStreetMap data instead of commercial street network data.

4.3.3 Land Use Information Information regarding the land use is used in order to classify the results in chapter 5. Also for this reason, the data of OpenStreetMap was taken to extract polygons with certain land uses. There is an extensive list of tags in the OpenStreetMap Wiki (2015b) which indicates different land use classes, such as ‘residential’, ‘forest’ or fields. Since different land uses usually have different characteristics in terms of visibility of satellites and obstruction through man-made structure, it the results are expected to be dependent on the land use. There are many tags which describe areas with different land uses, but only the seven most common tags have been extracted. Table 5 shows the different land uses and their characteristics. Rural areas like farmlands and allotments are characterized by fields and mainly low buildings. This means there are almost no structures which may influence the GPS signal through multipath effects or shadowing. In OpenStreetMap, there are two tags for farmland used (‘landuse=farm’ and ‘landuse=farmland’). According to the wiki, both tags mean the same, although farmland should be preferred over farm. Therefore, both term will be combined, and termed as ‘farmland’. Urban areas such as commercial, industrial or residential areas are characterized by taller buildings and urban canyons (Langley 1999) where multipath and shadowing effects are more likely. Through the quick change of reflected signals and from shadowing to open view (at cross roads), a heterogeneous GPS quality can be expected. In forested areas which are mainly covered by a dense tree canopy, it is very likely that the GPS signal is shadowed. In these areas a homogenous degraded GPS quality can be expected.

36

4 Methodology

value of key ‘landuse’ Characteristics - Gardening allotments - Small buildings commercial - Office buildings - Land used for farming farm - No buildings, no trees - Tillage and pasture - Same as ‘farm’ farmland - Should be used instead of farm - High trees forest - Dense canopy - Factories or warehouses - Wide streets for trucks and delivery industrial vehicles - Less obstruction then residential - Urban environment residential - Tall buildings - Narrow streets, trees

Table 5: OSM landuse-tags and their characteristics.

4.3.4 Digital Elevation Models Within this research, digital elevation models (DEMs) are used during the evaluation of the results. A DEM is a representation of the earth’s surface and may be acquired with different methods such as terrestrial surveying or remote sensing techniques like stereo photogrammetry, radar systems or LiDAR (Laser Detection and Ranging). Only with airborne or spaceborne remote sensing techniques it is possible to acquire data for a larger region in a reasonable time. Out of the measurements a DEM can be generated, using certain data analysis techniques such as interpola- tion. ‘Digital elevation model’ is a general term which is used for both specifications: ‘digital surface model’ (DSM) and ‘digital terrain model’ (DTM). There are different definitions of the terms (Zhilin et al. 2005; Cartwright et al. 2007), however, the terms will in this research be used as follows. As shown in Figure 15, DSMs and DTMs can be differentiated by what structures are included in the model. While a DSM is the representation of the earth’s surface including all objects like trees and buildings on it, in a DTM those objects are excluded. Using remote sensing techniques, the surface is measured (e.g. top of building, top of tree canopy) and consequently, a DTM needs to be corrected in order to exclude objects which are located on the terrain. 37

4 Methodology

DSM

DTM

Figure 15: Difference of DSM and DTM.45

DEMs can also be classified in terms of horizontal resolution. Czegka et al. (2004) classify DEMs in high-resolution DEMs with a horizontal resolution with a cell size smaller than 10 m, medium- resolutions DEM with a cell size of 30 m to 100 m and low-resolution DEMs with cell sizes greater than 500 m.

For the validation within this research, a high-resolution DEM will be used as reference data in order to compare the derived incline values. Furthermore, it will be made use of it for the error analysis of the GPS points. The DEM was computed from LiDAR measurements by ‘Landesamt für Geoinformation und Landentwicklung Baden-Württemberg’ between the years 2000 and 2005. It represents the terrain, excluding building and trees and is consequently a DTM. Areas, which couldn’t be measured, like building or very dense forests, were interpolated using neighboring points. The available data covers the entire pilot region with a horizontal resolution of 1 m and a vertical accuracy of 0.5 m. The DTM was given as a point txt-file, containing evenly distributed XYZ-coordinates. With the help of the function ‘Point to Raster’ of ArcGIS, the DTM has been converted to a georeferenced raster file. The coordinates were given in Gauss-Krüger coordinate system. The elevation values are heights above the sea level and are therewith referencing a quasigeoid. Contrary to a geoid, a quasigeoid is not an equipotential surface, however, it deviates from the geoid in a mm or cm range (cf. Torge 2001, p. 82). Since the GPS-points as well as the street network data is given in the 1984 (WGS84), the horizontal datum of the DTM is transformed to WGS 84. This enables the evaluation of the GPS accuracy.

Next to the high-resolution DTM, the SRTM-1 DEM shall be used to compare the calculated inclines with data which is, like crowdsourced GPS-points, free and globally available. The SRTM DEM is acquired from the satellite mission SRTM (Shuttle Radar Topography Mission). To be more accurate the SRTM DEM is a DSM, since objects on the earth’s surface are not reduced. The

45 Adopted from: http://www.gsi.go.jp/WNEW/TEC-NEWS/2007-tec172.html, checked on 22/05/2015 38

4 Methodology

SRTM mission is a joint project from NASA, the National Geospatial-Intelligence Agency and the German as well as the Italian space agencies. The DEM is available between 60° north latitude and 54° south latitude and covers therewith 80% of the land on the earth. The data is available in the datum WGS84 as horizontal reference system and EGM96 as vertical datum. The elevation refers therefore to the geoid, rather than to the ellipsoid. The absolute height error in Eurasia was identified by Farr et al. (2007) as 6.2 m. Another free available DEM is the ASTER Global DEM. It was compiled from data collected by the “Advanced Spaceborne Thermal Emission and Reflection Radiometer”, mounted on the Terra spacecraft. It has, like SRTM, a horizontal resolution of 30 m and covers the land between 83° north and 83° south latitude. Although, the coverage is better than the one of SRTM, the vertical accuracy is worse with only 9.2 m (Meyer 2011). Therefore, SRTM, as the most accurate and almost globally available data source, has been chosen to be used for the evaluation.

4.4 Workflow and Implementation

To get to the final result of having street segment enhanced with incline values, several steps need to be performed, as depicted in Figure 16. After the import of both, the street network data and GPX-files, the data set need to preprocessed in order to prepare it for the following steps. The GPS traces must firstly be linked to the individual street segments. This step is known as Map- Matching and detects GPS traces which were recorded on the street segments. After that, the incline of the street segment can sequentially be calculated, with the use of the assigned GPS traces. Then, the calculated incline values are compared with the incline values calculated from the LiDAR DTM and the SRTM DSM. In the following sections, all individual steps will be described in more detail.

Figure 16: Process of deriving incline information out of user-contributed GPS traces.

Since, the tools which are developed for the purpose of this thesis may also be of interested for other users and use-cases, they are planned to be published under an Open-Source license. Therefore each step is implemented as individual tool and all intermediate results are saved in the 39

4 Methodology database. Then the single steps can be individually used. The tools will be designed as generic as possible, that they can be used with different data sources. The requirements on the data sources in terms of modelling will be kept as small as possible. The latest version of each developed tool, can be found on the CD, attached to this master’s thesis. Additionally, some of the tools are accessible on the github account of the GISCIENCE Research Group of the University of Heidelberg46.

4.4.1 Data Import Before starting the actual process of calculating incline information, the data sources need to be imported into a PostgreSQL/PostGIS database, running on a local machine. This enables the storage of the file based GPS data and street network in a relational way and provides fast and easy access from the Java program. The following subsections describe the process of importing the GPS data and the OpenStreetMap dump, from which the street network and land use information are extracted.

4.4.1.1 GPS traces As mentioned in section 4.3.1, the GPX-planet file with the latest version from August 2013 is used as the data source and extended with the traces from the August 2013 to now, taken from the public trace list. The import is realized in Java and the source code is published on the Github account of the GISCIENCE Research Group47. The GPX-planet file is packed and compressed as an *.tar.xz archive and contains on the one hand an XML-file with and on the other hand all GPX- files in a directory structure. The metadata-file includes information about the traces, like user, user , number of points, description and tags. The tar.xz archive does not need to be unpacked in advance, since this is done on-the-fly using the combination of different Java classes, which handle the unpacking and reading automatically. The first file to read and parse is the metadata file. All entries are stored as single objects in a TreeMap, containing the GPX-id and its metadata object. In a TreeMap the IDs are indexed, which allows faster access then using a conventional HashMap. Once the metadata is stored in the memory, the GPX-files can be read sequentially. After reading a file, it needs to be parsed to Java object classes, which is known as unmarshalling48. The corre- sponding object classes need to be generated in advance from the XML-schema of GPX version 1.049.

Once the GPX-file is read it needs to be filtered. The workflow if the filter, which checks two conditions, is depicted in Figure 17. First, it is checked, if the trace contains elevation information. For this step it is only necessary to check a single track point, since either all or none track points have elevation information. If a track point does not contain information regarding the elevation,

46 https://github.com/GIScience, checked on 20/07/2015 47 https://github.com/GIScience/osmgpxfilter, checked on 20/07/2015 48 http://en.wikipedia.org/wiki/Marshalling_%28computer_science%29, checked on 20/07/2015 49 http://www.topografix.com/gpx/1/0/gpx.xsd, checked on 20/07/2015 40

4 Methodology the trace is skipped and not imported. Otherwise, it will be checked, if the trace falls into the bounding box of the pilot region. Here, the condition is met when at least one track point is within the region. Only if no track points intersect the bounding box, the trace will not be imported. The trace, which meets both conditions, is, together with its metadata, written into a Post- greSQL/PostGIS database. As mentioned in section 4.3.1, a GPX-file may contain several track- elements. Hence, each track of a GPX-file is stored as single 3D LineStrings, together with the GPX-id, track-id and metadata. The track-id is newly introduced and unique within the tracks of one GPX-file. Figure 18 shows the columns and their datatypes of the relation ‘gpx_data_line’. The columns ‘gpx_id’ and ‘trk_id’ build the primary key. After writing the trace to the database, the ID of the trace is put into an ArrayList. This list will later be used, to identify if a trace was already imported.

Figure 17: Filtering and import of GPS traces.

Figure 18: The schema of the relation 'gpx_data_line' for storing the GPS traces.

41

4 Methodology

Now, all files included in the GPX-planet file are imported. To add the traces which have been uploaded to OSM after August 2013, the public trace list is scraped for the additional traces. Scraping50 is the automatic reading of information from web sites. The public trace is split into pages of 20 traces and each page can be requested with its page number (e.g. https://www.openstreetmap.org/traces/page/1). The HTML-code of the trace list contains the links to the actual trace page which again contains the GPX-id and a link to the GPX-file. If the GPS-id is not contained by the ArrayList created during the import of the GPX-planet file, the file is downloaded and unmarshalled. In comparison to the GPX-planet file, the files are in the original version, which was uploaded by the user. Therefore, it can happen that the traces are stored as different GPX versions (1.0 or 1.1). This is problematic during the unmarshalling process because the version must be known. Consequently, the first step is to detect the GPX-version and then it will be unmarshalled, using the correct GPX -schema. Since, all following steps use the object classes, generated from the GPX version 1.0 schema, all traces from version 1.1 need to be transformed. After the file was unmarshalled successfully, it follows the same procedure of checking the file against elevation information and bounding box. After that, the tracks of the file are written into the database.

As already mentioned, the developed tool is published. In order to make this tool usable also for other use-cases, all the conditions, which are checked during the import are adaptable. The arguments must be given when starting the program from the command line. In addition to the export to PostgresQL, it also supports the output as ESRI shapefile or as database dump in the same format as the input data. It can also be decided, if only the dump should be used in input, or if the public trace list shall be scraped in addition. The following command has been used to run the program. It specifies the path to the input file and defines the desired bounding box. The parameter ‘datasource’ with the value ‘both’ means, that the dump is imported and the OSM trace list is requested. The options ‘clip’ and ‘elevation’ ensure that GPS-points outside the bounding box are not written and only points with elevation information are imported to the database. At the end the database parameters are given as well as the geometry format.

java -jar osmgpxfilter-0.1.jar \ -bbox top=49.459693 left=8.573179 bottom=49.352565 right=8.794050 \ --clip \ --elevation \ --datasource both \ --input C:/baden-wuerttemberg.tar.xz \ --write-pgsql db=osmgpx user=postgres password=XX host=localhost port=5432 \ geometry=linestring

50 http://en.wikipedia.org/wiki/Web_scraping, checked on 20/07/2015 42

4 Methodology

4.4.1.2 OSM Street Network and Land Use Information The street network and the land use information are extracted from the OpenStreetMap planet file. This entire planet file contains all map data of the world and has therewith a size of around 30 GB. Since, only the data of the pilot region is necessary for this scope, the regional extract from the German state Baden-Württemberg was downloaded51. The import to the database was undertaken using the command line Java program Osmosis52. The tool reads the planet file and its regional extracts sequentially and writes the data into relations corresponding to the OpenStreetMap data model. It also provides filter capabilities based on the tags and bounding box. Consequently, only ways containing either the tag ‘highway’ or the tag ‘landuse’ within the bounding box of the pilot region were imported. Relations and nodes were rejected to keep the required space in the database as small as possible and the time needed for the import as short as possible. The command to read, filter and write the dump to the local database looks as follows:

osmosis \ --read-pbf C:\baden-wuerttemberg-latest.osm.pbf \ --bounding-box top=49.5117 left=8.52791 bottom=49.311 right=8.83534 \ --tag-filter accept-ways highway=* \ --tag-filter accept-ways landuse=* \ --tag-filter reject-relations \ --tag-filter reject-nodes \ --write-pgsimp host="localhost" database="osm" user="steffen" password="xx"

Prior import, the database schema with the necessary relations must be created. SQL-scripts containing the creation SQL statements are provided with the program files of Osmosis.

4.4.2 Preprocessing This section describes the steps, which are done in order to prepare the data for the calculation of incline. Preprocessing steps are applied to both the GPS traces as well as the street network.

4.4.2.1 GPS data As described in section 4.3.1.4 the GPS data contains noise and other errors which may degrade the quality of incline values. As a consequence, the traces are preprocessed to lower the impact of such irregularities. The preprocessing is implemented in Java and has two steps. Firstly, traces are split when the distance between two adjacent points or the change in elevation is exceeding a certain threshold. Secondly, the split traces are smoothed in order to reduce the noise in the elevation measurements. Figure 19 shows the workflow of the entire process of preprocessing.

51 http://download.geofabrik.de/europe/germany.html, checked on 20/07/2015 52 http://wiki.openstreetmap.org/wiki/Osmosis, checked on 20/07/2015 43

4 Methodology

Figure 19: Flowchart of preprocessing the GPS traces

First of all, it is looped through the ordered list of all track points which compose the trace to detect the point of the trace where it needs to be split. The distance d and the change in height h are calculated between each point and the adjacent one. If d or h exceeds the given threshold, the trace will be split into two parts at the detected break point. The first part, from the start of the trace to the breakpoint, will be stored in a list. The second trace part may still contain errors, therefore the splitting process starts again with the second trace. This goes on until the trace does not contain any distances and changes in elevation, greater than the defined threshold. Since on trace, prior uniquely identified with ‘gpx_id’ and ‘trk_id’, is split, a new ID, the ‘part_id’, is introduced. The threshold values are chosen to be 300 m for the maximum distance as in Zhang et al. (2010) and for the maximum h, 10m seems reasonable. As found out in section 5.1.1, the majority of h’s in a flat area is below this, therefore 10 m can be considered as an error. These values are highly experimental and changing them may lead to further improvements.

Once, a trace is split in parts, each part will be smoothed. Methods for smoothing time series were reviewed in section 3.5. A linear smoother is preferred over a non-linear one, since it smooths also abrupt changes in the elevation, which usually cannot be expected on streets. Consequently, a weighted moving average algorithm has been implemented. The weights must be defined in

44

4 Methodology advance and the number of weights must be odd. Note, that also the sum, of all weights must be equal to one. Considering these condition, the weights can be individually defined by the user. They are not depending on the distance between, the data points, since a GPS trace is usually recorded using a certain distance and time interval. Thus, the time series is considered as equidis- tant. The following set of weights W has been used:

푊 = {0.1, 0.125, 0.15, 0.25, 0.15, 0.125, 0.1}

Since the number of weights is seven, each data point will be smoothed, considering the two preceding and the two following data points. The further away a point is, the less is its impact on the new smoothed data point. The smoothed value of a point is calculated as follows:

∗ ℎ푛 = (ℎ푛−3 ∗ 푤1) + (ℎ푛−2 ∗ 푤2) + (ℎ푛−1 ∗ 푤3) + (ℎ푛 ∗ 푤4) + (ℎ푛+1 ∗ 푤5) +

(ℎ푛+2 ∗ 푤6) + (ℎ푛+3 ∗ 푤7) ,

where

∗ ℎ푛 = smoothed elevation

ℎ푛−𝑖 = original elevation of data points

푤𝑖 = weights

The drawback of this approach is, that it cannot be applied to the three end values of each side, since there no adjacent points on each side. In order not to dismiss them, they are smoothed only using the adjacent values which exist. For example the first data point will be smoothed using the first three and the second data point using the first four values in the series. The same is applied to the values at the end of the series. After smoothing the trace parts, there are written as 3D Lin- eStrings into a new table in the database (Figure 20). The primary key contains the three id columns, ‘gpx_id’, ‘trk_id’, ‘part_id’.

Figure 20: Columns of the relation, which stores the preprocessed GPS traces.

45

4 Methodology

4.4.2.2 Street Network The incline will be calculated for each street segment of the OpenStreetMap street network within the pilot region. The length of a street segment is often determined by the semantic properties, instead of the geometry. For example a street segment is as long as the part of a street which has one name. These segments may sometimes be quite long and span several valleys and peaks. This is not of advantage for the calculation of incline, since an average incline value is calculated for each segment. If one segment, for example, contains an incline going up and one incline going down with the same magnitude, the calculated incline would be 0. To overcome this issue, the street segments are split into smaller parts. The parts also should not be too short, since this also decreases the number of GPS points, which can be used for the calculation. Therefore, the streets are split at the intersection points with other street segments. This was done in QGIS, using the function ‘Split lines with lines’ of the processing toolbox. Due to the splitting, the osm-id cannot be used as a unique identifier, therefore, a new id is introduced. The relation in the database looks now as in Figure 21.

Figure 21: Schema of relation 'streets'.

After the first step, the split street segments have been enhanced with land use information. Therefore, the function ‘Join attributes by location’ of QGIS was used. In OpenStreetMap it may happen, that the polygon of the land use does not cover the street. In order to add the land use which is next to the street, a buffer of 20 meter was applied to the polygon in advance. Figure 22 shows the street network and land use polygons, which do not cover the street. The dashed line shows the buffered land use polygon. The land use information is stored as a new tag with the key ‘landuse_incline’.

46

4 Methodology

Figure 22: Enhancement of street network with land use information in cases, where land use polygon does not cover the street segment.

4.4.3 Map Matching As stated in section 3.4, map matching is an essential step in mining street-attribute information out of GPS traces. It is important to know, to which street the derived information can be referred. The map matching approach for this research is implemented in Java and based on the algorithm of Zhang et al. (2010). Figure 23 shows the workflow of the implemented algorithm.

Figure 23: Flowchart of map matching process.

47

4 Methodology

First of all, the street segments are requested from the database. They are processed sequentially and for each segment the corresponding GPS traces will be determined. To find candidate traces from the relation ‘gpx_data_line’ the geometry of the street segment is buffered with a buffer size of 50 m. All GPS traces are selected as candidate traces which intersect the buffer of the street geometry. Therefore, it is important that the buffer size is not chosen to be too small, since all possible traces should be selected. Figure 24a depicts the street segment and its buffer in dark and light green, and the GPS traces which intersect the buffer in orange.

Now, GPS traces must be dismissed, which were not recorded on the selected street segment. This is often done by analyzing the distance and the direction between the street segment and GPS trace. Here, no direction or distance is calculated. For each node of the street segment, a line is created, that intersects the node and is perpendicular to the current edge of the street segment. For this step the Java class GeodeticCalculator of the library GeoTools was used. It allows the calculation of new coordinates with a given reference point and distance as well as direction from it. The length is chosen to be 30 m like in Zhang et al. (2010), considering the horizontal error of GPS measure- ments and the width of a street. Figure 24b shows the calculated profile lines in dark blue.

Once the profile lines are created, for each trace of the candidate traces it is checked, how many of the profiles line are intersected. If a trace intersects most of the profile lines, it means that it follows the direction of the street segment and is generally not further away from the centerline of the street than 30 m. The minimum percentage (threshold) of intersected profile line is set to be 70 % for this research. Lowering the threshold to 50 % would on the one hand give more matches but also more false positives (not correctly matched), especially in case of street segments with only two nodes. In this case it may happen that also traces are matched that only intersects with one profile line. Then it cannot be assumed anymore, that the trace follows the direction of the street. On the other hand, applying a threshold of 100 % would be too restrictive. Especially, when thinking about street segment with many nodes. It may always happen that the trace exceeds the distance of 30 m because of the GPS inaccuracy or the geometric error of the street segment. Therefore, the threshold has been chosen to be 70 %. Figure 24c shows the GPS traces which intersect at least 3 out of 4 profile lines in red.

48

4 Methodology

Figure 24: The map matching process: Select candidate traces with buffer (light green) of street (dark green) (a), create profile lines (blue) (b), select traces (red) which intersect at least 70 % of the profile lines.

If the percentage of intersected profile lines is greater than the threshold, the GPS trace is consid- ered as being recorded on the street segment. The approach by Zhang et al. (2010) includes further processing in case of parallel streets. If two street segments are parallel to each other it is checked first, if one of the street segments is a one-way street (e.g. a highway, where each direction is mapped individually). Then the direction of the traces is calculated and only matched with the street segment if the direction of the street is similar. When there are parallel streets, which are not one-way streets, a clustering algorithm is used to identify corresponding traces. This is not necessary in the case of this research, since it doesn’t matter for the calculation of the incline on which of the parallel streets the trace was recorded. It is assumed that streets which are parallel have the same incline. Consequently, there is an ‘n to m’ relationship between the street segments and GPS traces. One street may thus have multiple GPS traces and one GPS trace may be matched to multiple street segments. One example of such a case is when footways next to the street are mapped as individual LineStrings. With this assumption more traces can be matched to the street segment if there are two parallel geometries. Examples are streets, represented with two geometries for each direction or street-accompanying footways / bicycle lanes. However, this assumption may also lead to error, when there are two parallel streets which do not have same incline, such as a drive way to bridges as shown in Figure 25.

Figure 25: Example for two parallel street, which are do not have the same incline. 49

4 Methodology

After the traces of a street segment street have been found, the IDs referencing street segment and GPS trace are written to the database. The relation called ‘streets_gpx’ references the relations ‘streets’ and ‘gpx_data_line’ via foreign keys. This is shown in the UML diagram in Figure 26.

Figure 26: The tables 'gpx_data_line', 'streets_gpx' and 'streets' and their relation to each other.

The tool is published and therefore provided to people who may need it. The program is kept as generic as possible that it also works with other database schemas. The required database relations and the columns as well as the input parameter mentioned above are specified in a properties file (Figure 27).

50

4 Methodology

#Properties file for map matching

####### street network ####### # name of street table in database t_streetName=streets # column with unique id for each street segment t_streetIdCol=id # column with osm_id (must not be unique, in case street segment were split in preprocessing) t_streetOsmIdCol=osm_id # column with osm tags, stored in hstore t_streetTags=tags # column with geometry. geometry type must be LineString and CRS must be WGS 84 t_streetGeomCol=the_geom

####### gpx input data ####### # name of gpx table in database t_gpxrawName=gpx_data_line # unique id for each GPS trace t_gpxrawIdCol=gpx_id # unique id for each GPS trace t_trkrawIdCol=trk_id # column with geometry. Geometry type must be MultiLineString and CRS must be WGS 84 t_gpxrawGeomCol=geom

# default name for output table dbMatchingOutputTable=streets_gpx

#buffer in meters (should be equal or bigger than streetProfileLength) streetBuffer=60

# length of profile lines which are fitted through the nodes of the street segments [m] streetProfileLength=30

#ratio of profile line of street which need to be intersected by GPS-trace in order to assume a match streetProfileIntersectionRatio=0.7

Figure 27: Properties file of map matching tool

51

4 Methodology

4.4.4 Calculation of Incline After the prior steps, the calculation of the incline of street segments can be done. Like in the map matching process the street segments are also processed sequentially. The visualized workflow is depicted in Figure 28.

Figure 28: Workflow for calculating the incline of street segments.

The first step is to request the street segments from the database. After that, the preprocessed GPS traces (cf. section 4.4.2.1) must be requested from the database. Through the relation ‘streets_gpx’ and a join with the relation ‘gpx_data_line_preprocessed’ they can easily be reqested. Usually, the GPS traces are span over more than one street segment. Thus, the traces need to be cut, so that only the track points are selected, which are relevant for the calculation of the incline. Relevant are those track points, which are near the street segment and not before or beyond it. To achieve this, a buffer with a size of approximately 30m is calculated. With the buffer, the corre- sponding GPS traces can be clipped. Consequently, only the part of the GPS traces are used, which is within the buffer polygon of the street segment.

52

4 Methodology

Figure 29: Clipping of assigned GPS traces.

The process of clipping is performed on the database, since the Java library GeoTools does not support clipping of 3D geometries. For the selection of the GPS traces and the clipping, following SQL-statement is executed from the Java-Tool:

SELECT g.gpx_id, g.trk_id, g.part_id, ST_ASGEOJSON( ST_INTERSECTION(g.geom,ST_GeomFromEWKB('buffered_street_geom')) ) AS geom FROM streets_gpx sg LEFT JOIN gpx_data_line_preprocessed g ON sg.gpx_id = g.gpx_id AND sg.trk_id = g.trk_id WHERE sg.street_id = 'current_street_id'

Besides the IDs which uniquely identify the GPS trace, the clipped geometry is returned by this SQL-statement. It has to be noted, that the PostGIS function ST_INTERSECTION returns the clipped geometries of the GPS trace parts as MultiLineStrings. This has to be considered later, when the incline is calculated.

53

4 Methodology

Once, all corresponding traces with clipped geometry are requested from the database, for each trace the incline in percent is calculated. Since a GPS trace is a MultiLineString it needs to be split into LineString objects. The incline, here denoted by m, is the weighted average of all calculated inclines of the edges of the LineString. The weight is the horizontal length of an edge and it is normalized with the full horizontal length of the LineString. Since the elevation is given in meters and the coordinates of the GPS in geographic coordinates, the distance in meters is calculated considering the earth as a sphere. The calculation of incline can be expressed using following formula:

푛−1 ℎ − ℎ 푑 푚푡 = ∑ ( 𝑖 𝑖+1) ( 𝑖,𝑖+1) , 푑𝑖,𝑖+1 푙 𝑖=1

where

푚푡 = incline of GPS trace segment in percent 푛 = number of track points ℎ = elevation of track point 푑 = horizontal distance between two track points 푙 = horizontal length of GPS trace segment

In OSM, the street segments are directed from the first node of the segment to the last one. Consequently, it has to be checked whether the GPS trace was recorded in the same or opposite direction of the street. This is done by a comparison of the average bearing of the street segment and the GPS trace. If the difference in bearing is greater than a certain threshold, it is assumed that the GPS trace follows the opposite direction and the calculated incline value should be inverted. Individual samples have shown that due to the geometric accuracy of both, the GPS trace and street segment, a threshold of 40° is reasonable.

54

4 Methodology

The next step is to combine the incline values of the individual traces to get a value which represents the incline of the street segments. Since the length of the trace may vary, also here, the weighted average based on the length of the traces was chosen. Following formula has been used for averaging the incline values of the individual traces:

푘 푙 푚푠 = ∑(푚푡 ) ( 푎 ) , 푎 ∑푘 푙 푎=0 푎=0 푎

where

푚푠 = incline of street segment in percent 푚푡 = incline of GPS trace in percent 푘 = number of corresponding traces 푙 = horizontal length of trace

After the calculation, the result is written into a new relation, called ‘streets_incline’. Besides the street-id and the calculated incline, the number of traces which have been used for the calculation, is added to the table. This supports the estimation of the accuracy, which is described in section 0.

4.5 Validation

In chapter 5, the results will be validated to estimate the achieved accuracy. This will be done by a comparison with incline values of which have been derived using a high-accuracy DTM. Those incline values could be calculated out of terrestrial measurements. To acquire test data for a reasonable amount of streets would be very costly and time-consuming. Therefore, it was decided, to use incline values calculated from a high accuracy DTM, acquired from LiDAR measurements (cf. section 4.3.4). To avoid confusion, these incline values will in the following be referred to as DTM incline, whereas the incline calculated from GPS traces will be noted as GPS incline. With the use of the DTM, it is possible to calculate an incline value with a reasonable accuracy for all street segments. Moreover, the results of the thesis shall be compared to incline values calculated from the SRTM-1 DSM (SRTM incline). This will show how crowdsourced GPS traces perform in comparison to other globally and freely available data.

The first step of calculating the incline from a DEM which is available in raster format is to densify the street geometry in order to increase the number of node per street segment. The street segments will get additional nodes, maximum every three meters. For each node of the densified street geometry, the absolute elevation is taken from the DEM. The Java library GeoTools provides classes and functions to easily load georeferenced raster files and to return the elevation for a 55

4 Methodology specific location from it. Once, each node has an elevation value the incline of the street segment can be calculated. This is done by averaging the incline values of each edge of the street segment. Contrary to the calculation of the GPS incline, a non-weighted average is calculated, since the nodes of the street segment are equidistant. The DTM incline and the SRTM incline are then added to the relation ‘streets_incline’, which was already created during the calculation of the GPS incline.

For the calculation of the SRTM incline, the elevation of each node of the street segment is looked up from the SRTM-1 DSM. The horizontal resolution of SRTM-1 is approximately 30 m. If a street node is considered to be every 3 m, 10 points in a row will have the same elevation value from the DSM. This would affect the quality of the SRTM incline. To avoid this, the SRTM-1 DSM was resampled to a horizontal resolution of 1 m by 1 m, using the ArcGIS function ‘Resample’. This results in interpolated elevation values for the each 1 m by 1m pixel.

56

5 Discussion of Results

5 Discussion of Results The implemented methods from the previous chapter were applied to the GPS and street network data of the pilot region. The outcome will be discussed in the following sections. First of all, the accuracy of the GPS raw data will be assessed. After that, the calculated GPS incline values will be validated using a high-accuracy DTM and compared to the incline values derived from the SRTM-1 DEM.

5.1 Analysis of Crowdsourced GPS traces

As already stated in section 0, a total number of 3842 GPS traces with over two million track points with elevation information are used for this research. These 3D points, visualized in Figure 30, are in this section analyzed on vertical accuracy as well as coverage and density.

Figure 30: Screenshot of visualized GPS track points, colorized according to their elevation. (green=low, red=high)53

5.1.1 Vertical Absolute and Relative Accuracy In order to be able to judge the accuracy of the data source for the calculation of incline values, the measured GPS elevation is compared to the elevation from the LiDAR DTM. Firstly, the absolute error of the GPS track points is calculated, whereas secondly the relative accuracy within the GPS traces will be evaluated. The relative accuracy refers to the difference of elevation between two adjacent points. It has to be noted, that the elevation coming from the LiDAR-DTM may also suffer from inaccuracy. The DTM reflects the terrain of the earth, consequently, all structures on the

53 Screenshot taken from: http://cap4route.geog.uni-heidelberg.de/hd-osm-gps-webgl/hd-osm-gps- webgl.html, checked on 20/07/2015 57

5 Discussion of Results earth’s surface, such as buildings, trees or bridges are not contained. The areas where no ground elevation could be measured are interpolated. This leads to an error, when transferring the DTM elevation to a GPS track point, which is located a bridge. Furthermore, the horizontal inaccuracy of the GPS points influences the elevation values from the DTM. Considering that a GPS trace is recorded on a street directly next to a very steep slope, the GPS track point may fall up to 10 m next to street, depending on the horizontal error. As a consequence, a wrong elevation value is transferred from the DTM.

5.1.1.1 Absolute Accuracy For the assessment of the absolute accuracy the root-mean-square-error (RMSE) has been calculated overall and for each land use class. The RMSE (the square root of the average of the squared residuals) is commonly used in spatial analysis and provides a measure of the differences between GPS and DTM elevations. The residual is the difference of the GPS measurement and the reference value (DTM). The diagram in Figure 31 shows the RMSE in meters, overall and differentiated by land use classes. For the calculation of the RMSE, only 90 % of the residuals have been used while the other 10 % gave ben excluded as outliers. The outliers may come from traces which were not recorded on the earth’s surface (e.g. from an airplane) or from wrongly calibrated devices with barometric elevation measurement unit.

farmland 21 industrial 22

forest 22

residential 25

overall 27 Landuse grass 30 commercial 34 allotments 35

0 5 10 15 20 25 30 35 40 RMSE [m]

Figure 31: Vertical accuracy of crowdsourced GPS traces, distinguished by land use class.

Depending on the land use class, the error ranges from 21 m (farmland) to 35 m (allotments). The overall RMSE is approximately in the middle with 27 m. According to Liu et al. (2014), the vertical accuracy can be up to 2.5 times higher than the horizontal one. Considering a horizontal accuracy of 6 to 10 m (cf. Zhang et al. 2010), the vertical accuracy can be assumed to be 15 m to 25 m. The RMSE of the data evaluated in this study, is only for some land use classes within this range, however, overall the error is higher. several reasons may lead to this errors. User-generated traces are likely not to be recorded under very good conditions. One may store the GPS device in a car, in the pants pocket or in the backpack, while hiking. Under these conditions, an additional 58

5 Discussion of Results error through due to may occur. Furthermore, the GPS data may contain traces, which have not been recorded on the earth’s surface. Traces have been found, which were obviously recorded from a flying object. An additional reason may be that the diversity of different devices may affect the result. It can happen that some elevations originate from either barometric measurements or elevation databases as stated in section 4.3.1. While barometers may be wrongly calibrated and consequently introduce a systematic error, elevation databases may be referenced to a geoid instead of the WGS 84 ellipsoid by transforming the ellipsoidal height from the GPS to geoidal height. This would introduce a systematic error as well and affect the measure of the absolute vertical accuracy.

The differentiation by land use has been undertaken to evaluate if there is a dependency on the RMSE. Different land use classes have different characteristics with regard to the obstruction of the GPS signal. Especially, in forested or residential areas a larger error would be expected, since a dense tree canopy or high buildings obstruct the view from the GPS receiver to the satellite. In addition multipath effects may occur when the GPS signal reflect on the façade or windows of buildings. Contrary, areas of land use classes such as ‘farmland’, ‘allotments’ or ‘grass’ where expected to have a smaller absolute error. In those areas one could generally find small houses, just a few trees and wide streets. Consequently, there are not many structures which potentially influence the GPS signal, therefore a wide and unobstructed view to the sky and the satellites is possible. As it can be seen in the diagram in Figure 31 this assumption cannot completely be confirmed in case of OSM GPS traces. The land use classes ‘allotments’ and ‘grass’, of which it was expected to be less erroneous, showed one of the largest error, whereas GPS track points in the class ‘forests’ have one of the smallest error. This is very surprising, but a reason may be the sample size of the land use classes. ‘Forest’ is one of the land use classes with the most GPS track points, whereas ‘allotments’ and ‘grass’ have less.

Figure 32 depicts a histogram of the differences between GPS and DTM elevations that have been used to calculate the aforementioned RMSE. The vertical axis shows the total number of points which fall into the bins represented on the horizontal axis. The width of the bins is two meters. It can be seen, that data is mainly normally distributed around zero meters. Since, the elevation of the DTM is referenced to the quasigeoid (cf. section 4.3.4) and the GPS elevation is usually referenced to the WGS 84 ellipsoid, an offset equivalent to the geoid undulation was expected for most of the points. The geoid undulation between the WGS 84 ellipsoid and the Earth Gravitational Model 96 (EGM96) in Heidelberg is approximately 48 m54. Therefore, it can be assumed that most of the GPS elevations are referenced to a geoid. Most likely, the elevation measurements are internally

54 Calculation done by online geoid calculator under: http://geographiclib.sourceforge.net/cgi-bin/GeoidEval, checked on 20/07/2015 59

5 Discussion of Results transformed by the software of the device. In the histogram a second peak at around 48 m can be found. This represents these GPS elevations which are referenced to the WGS 84 ellipsoid.

200

150

100 thousands

Frequency 50

0

2 6

-6 -2

10 14 18 22 26 30 34 38 42 46 50 54 58 62 66 70 74 78

-30 -26 -22 -18 -14 -10 Differences GPS and DTM Elevation [m]

Figure 32: Histogram with the differences of GPS and DTM elevation

5.1.1.2 Relative Accuracy With approximately 27 meters, the absolute accuracy appears to be very large, especially compared to the vertical accuracy of the SRTM-1 DSM, which is 6.2 m (cf. section 4.3.4). Since, for the calculation of incline only the difference in elevation of two adjacent track points is used, the relative accuracy is evaluated in this section. For a GPS track point, the change of elevation hGPS to the next track point of the trace has been calculated. This value reflects the actual incline of the terrain including an error caused by the GPS measurement. Since the absolute error is not observed here, the occurring errors can be considered as noise. To remove the influence of the terrain, the high-accuracy DTM has been used to calculate the change in elevation hDTM. The difference of

GPS DTM h and h indicates also the actual GPS error eh. Therefore, the following formula has been used:

퐺푃푆 퐷푇푀 푒∆ℎ = ∆ℎ − ∆ℎ

For the assessment of the relative accuracy, the preprocessed and smoothed GPS traces have been used. eh has been calculated for approximately 1.7 million track points. The box-and-whisker plot in Figure 33 shows the distribution of eh overall and differentiated between land use classes. The bottom and top of the boxes represent the first and third quartile, whereas the ends of the whiskers show the 5th and the 95th percentile. This happened due to a few but very large outliers present in the data set.

60

5 Discussion of Results

2

1,5

1

0,5

[m]

0 h

 e -0,5

-1

-1,5

-2

Land use

Figure 33: Relative accuracy of crowdsourced GPS track points, overall and distinguished by land uses.

Overall, the error in the change of elevation is for 50 % of the data within a range of ± 0.16 m. The whiskers go up to approximately ± 1.00 m. Contrary to the absolute error, it is here obvious that land use classes which do not suffer from obstructions of buildings and trees perform better than others. For areas with mainly grass and farmland, the range of the box is within approximately

± 0.12 m (grass) and ± 0.10 m (farmland). For 90 % of the data, eh falls into an interval of ± 0.7 (grass) and ± 0.5 m (farmland). For land use classes which suffer more from obstructions like ‘commercial’, ‘industrial’, ‘residential’ and ‘allotments’, 50 % of the points are influenced by an error within a range from ± 0.13 m to ± 0.15 m. Although the first and third quartiles of the aforementioned land uses are similar, the extents of the whiskers vary. While the whisker of the land use classes ‘industrial’ is within a range of ± 0.6 m, the range increases to approximately ± 0.7 m for ‘commercial’ as well as for ‘allotments’ and even over ± 0.8 m for the land use ‘residential’. The land use class which has the largest extent of errors in this investigation is forest. The error for 50 % of the data falls into the range of ± 0.26 m, which is more than twice as high as th th for farmland. Also, the 5 and 95 percentiles show a large deviation of eh, with approximately ± 1.5 m. This reflects the obstructions which are present in forest areas, due to the dense tree canopy.

For the calculation of incline not only the difference in elevation is important, also the distance between the two adjacent points matter. Therefore, it cannot be judged from the above examined error eh, how much it influences the incline value. To get an idea of it, all values of eh were aggregated by land use class and the RMSE was calculated with 90 % of the data. Furthermore, the

61

5 Discussion of Results average distance between two adjacent points within the areas of the specific land use classes have been used to calculate the effect of the error to the incline. In Table 6, it is shown that the calculat- ed incline between two adjacent GPS track points contains an error of 2.4 %. Certainly, the result depends on the land use class. The smallest error occurs in areas of the land use classes ‘grass’ and ‘forest’ with equal or less than 1 %. In forested areas an error of over 4 % can be expected. All track points within the other areas, are in a range of 2.2 % to 2.7 %.

Land Use RMSE eh [m]  Distance [m] ≙ incline [%]

overall 0.3 14 2.4% allotments 0.2 11 2.2% commercial 0.2 10 2.4% farmland 0.2 17 1.0% forest 0.5 11 4.3% grass 0.2 29 0.8% industrial 0.2 10 2.4% residential 0.3 10 2.7%

Table 6: The effect of the relative accuracy on the calculated incline.

To summarize, it can be said that the relative accuracy depends on the land use class and on its characteristics regarding obstructions through buildings, trees or other structures. The error sources which are responsible for this result are mainly shadowing and multipath (cf. section 2.1.2). Other error sources like ionospheric effects or clock inaccuracies are not affecting the results as much as the shadowing and multipath. Furthermore, it can be seen from the box-whisker plot in Figure 33 that the error occurs more or less equally in positive and negative direction. Therefore, it may be assumed that due to the large number of points which are used for the calculation of the incline for one street segment, the noise will disappear by calculating the average.

5.1.2 Coverage and density In this section the coverage and density of the OSM GPS traces are examined. This will give an idea of how many streets data actually exists and how dense the GPS track points are. The coverage is here investigated using the result of the map matching algorithm implemented for this research. It has to be noted, that due to the n to m relationship, GPS traces may be matched to more than one street.

Figure 34 shows a street map of the city of Heidelberg indicating the coverage of streets with GPS traces. The street segments are colorized according to the number of traces, from green (few traces) to red (many traces). Street segments visualized in blue do not have any matched GPS traces. A large share of the street segments are covered with at least one trace, however, many 62

5 Discussion of Results streets have no matching GPS traces. Streets with over 20 corresponding traces are relatively rare. Streets with a high traffic volume have the best coverage. There are the motorways (German Autobahn) in the upper left corner, the streets on the two river sides of the river Neckar (upper right corner) and the two primary streets in the center of the figure going in North-South direction.

Figure 34: Map, showing the coverage of the streets with GPS traces. (Map: OSM)

The coverage of GPS traces has now been investigated in more detail. It shall later be investigated, if the number of traces has an effect on the accuracy. Therefore, it is interesting to know how many corresponding traces can theoretically be used to calculate the incline. Figure 35 shows the share of street length by street type with different numbers of assigned traces. The street types ‘motorway’, ‘primary’, ‘secondary’ and ‘cycleway’ have a coverage with at least one GPS trace of almost 100 %. Other street types such as ‘residential’, ‘path’ and ‘footway’ are covered with at least one trace in 42 % to 61 % of the cases. This fact shows that the coverage with GPS traces depends on the traffic volume and the street type. For example, the motorway, which is probably the street type with the highest traffic volume, is completely (100%) covered with at least 25 traces, followed by the street type ‘primary’ which is in the hierarchy of street types below ‘motorway’. The 100 % of the primary streets have at least 5 GPS traces and it decreases to 70 % with at least 20 traces. With 30 traces, still 20 % of the primary streets are covered. From the streets of the types ‘secondary’ and ‘tertiary’ as well as ‘cycleway’ still around 95% are covered with at least 1 trace and at least 5 traces for approximately 80 % of the streets. The street types with the lowest coverage are ‘residential’, ‘footway’ and ‘path’. But still 42 % to approximately 60 % have at least 5 traces, whereas this significantly decreases to 15 % with at least 5 traces. With 30 or more traces only less than 5 % of the streets of the aforementioned types are covered. Generally it can be said, that streets of higher priority have more matched GPS traces. Streets which can be used for bicycles

63

5 Discussion of Results perform as good as secondary and tertiary streets. Paths which are dedicated to pedestrians are comparable to residential streets.

100% 90% 80% cycleway 70% footway 60% motorway 50% path 40% primary 30%

Shareof length lstreet residential 20% secondary 10% tertiary 0% 1 5 10 15 20 25 30 Minimum number of traces

Figure 35: The coverage with GPS traces for different street types.

Besides the coverage, also the density of GPS track points within a GPS trace is examined, using the preprocessed data set. The term density refers to the distance between two adjacent GPS track points. A shorter distance between two track points also means that there are more GPS track points per street segment to calculate the incline. Due to the noise in the GPS data, more points will make the result more robust. The interval, in which GPS track points are recorded, can usually be set in the settings of the device. It can be either time-dependent (e.g. every second) or location- dependent (e.g. every 30 m). Figure 36 shows a diagram of the average distance between two consecutive track points, distinguished by street type. The type ‘motorways’ is the the street type with the largest average distance between two points, with approximately 35 m, followed by ‘primary’ and secondary streets (approximately 17 m). In the middle the street types ‘cycleway’ and tertiary have an average distance of almost 14 m. This value also reflects the overall distance. The street types with the shortest distance are ‘residential’, footway’ and ‘path’. Like in the investigation of the coverage, a dependency between the distance and the priority of street types and the average speed can be seen. Whereas a motorway is likely to be the street type with the highest speed, a footway or path is the one with the lowest since they are used only by pedestrians. This leads to the assumptions that the main part of the track points are recorded with a time- dependent interval which results in different distances between two adjacent GPS track points.

With an average distance between two adjacent GPS track points of 14 m, two track points fall into a one pixel of the SRTM-1 DEM, which is equivalent to the horizontal resolution of 30 m by 30 m.

64

5 Discussion of Results

motorway 36 primary 18

secondary 16 overall 14 cycleway 14 tertiary 14 streettype residential 10 footway 9 path 8 0,00 5,00 10,00 15,00 20,00 25,00 30,00 35,00 40,00 Average distance between two adjacent track points

Figure 36: Average distance of two adjacent GPS track points differentiated by street type.

5.2 Analysis of Calculated Incline

The GPS incline was calculated for a total length of 3064 km street network in the pilot region. Due to the incomplete coverage of the OSM GPS traces this corresponds to approximately 57 % of the complete street network, which has a total length of 5338 km. The map in Figure 37 visualizes the calculated GPS incline for the street network within the pilot region. Streets for which no GPS incline was calculated are not shown. It can be seen, that the western part of the region is mainly flat, whereas the eastern part is mountainous. In section 5.1.1, the achieved accuracy of the calculated GPS incline is examined using the DTM incline. In addition, the accuracy of the GPS incline will be compared to the incline derived from the SRTM-1 DSM in section 5.2.3. This will show how the approach of this research performs in comparison to other open-licensed and globally available data.

Figure 37: Visualization of the GPS incline. Streets with no coverage are not displayed. (Map: OSM)

65

5 Discussion of Results

5.2.1 Exclusion of data from the evaluation The error of incline may be influenced by irregularities of the DTM. Figure 38 shows a motorway junction with the underlying DTM for which the DTM incline is calculated erroneously, with a value of 30 %. It shows a motorway junction with the underlying DTM. The phenomenon especially happens at bridges or underpasses, since bridges are partly removed from the DTM. Due to the characteristics of the pilot region, streets with an incline of more than 20 % are likely to exist only rarely. But for over 20 km of the street segments, the DTM incline is above 20 % and in some cases even over 100 %. 20 km corresponds to not even 1 % of the entire street network, but still influences the result due to the high magnitudes. Therefore all street segments which have a DTM incline greater than 20 % will be excluded from the evaluation to make sure that the result is not distorted by wrong reference data. Furthermore, street segments with a GPS incline and SRTM incline of over 35 % are not considered for this evaluation. The steepest street in the world, the Baldwin Street in Dunedin (), has a maximum incline of approximately 35 %55. Apart from a few and small paths in a mountainous region, streets with inclines over 35 % can practically not exist and are consequently considered as wrong calculation and will also not be used for this evaluation.

Figure 38: Erroneously calculated DTM incline, due to irregularities of the LiDAR DTM.

5.2.2 Accuracy of GPS incline The comparison of the GPS incline and DTM incline is realized through the calculation of the difference. It results in the incline error em which is given in percent. Following formula was used:

푒푚 = 푚퐺푃푆 − 푚퐷푇푀 ,

where

푒푚 = incline error in %

푚퐺푃푆 = incline, derived from GPS traces

푚퐷푇푀 = incline, derived from DTM

55 https://de.wikipedia.org/wiki/Baldwin_Street, checked on 20/07/2015 66

5 Discussion of Results

In the following sections, the error will be be evaluated overall, differentiated by land use classes as well as by different types of terrain such as flat and mountainous. Furthermore, it will be investi- gated if the number of traces which has been used to derive the GPS incline affects the accuracy.

5.2.2.1 Overall error Figure 39 shows the street network colorized by the incline error. There are 5 error classes, starting from smaller than 1% up to greater than 5 %, shown in a gradient from green (small error) to red (large error). Due to the incomplete coverage of GPS traces, the incline could not be calculated for all streets. Those are not shown in the map. It can be seen from the map, that street segments having a medium or large error are not equally distributed in this area. It is noticeable that on the western part only a few and short street segments are colored in yellow or red. As stated in section 4.1 the western part is mainly characterized by flat terrain and farmland. Contrary to the western part, in the eastern part more streets with a larger error can be found. This area is part of the “Odenwald”, a mountainous region with mountains up to 600 m and mainly forested areas.

Figure 39: Visualization of the error of GPS incline in the pilot region. (Map: OSM)

67

5 Discussion of Results

In the following, the distribution of the incline error is investigated. The histogram in Figure 40 shows the distribution of the incline error in percent within a range of ± 13 %. The vertical axis represents the relative frequency of error values, falling into bins, which have a width of 0.5 %.

Errors with a magnitude greater than 13 % are too few to visualize them and are therefore not shown in the histogram. The error of the incline appears to be normally distributed around the mean of -0.05 m. The standard deviation is σ = 3.97 %, which means that approximately 68 % of the incline errors are within the range of ± σ. When recalculating the standard deviation with only 95 % of the data, it results in σ = 2.31 %, which is almost as half as much as calculated with 100 % of the data. This means, that there are GPS incline values which have been calculated with a large error.

18% 16% 14%

12% 10% 8%

Frequency 6% 4% 2% 0% -13-12-11-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Incline Error [%]

Figure 40: Histogram of the overall incline error in percent and the bell-curve (red).

68

5 Discussion of Results

Table 7 shows the absolute length in kilometers of the street segments for which the incline was calculated below the error. In addition, the percentage of the total length of street segments for which it was possible to derive incline information (3064 km). Depending on the application which uses incline information, the acceptable error may be different. The higher the acceptable error the more street segments are available. For almost two-thirds (61 %) of the street network the GPS incline can be derived with an error smaller than 1 %. This increases to 85 % considering an error up to 3 % or even to 92 % if an error smaller than 5 % can be accepted.

Incline Error em absolute length of share of length of street segments street network with GPS incline < 1% 1872 km 61 % < 2 % 2370 km 77 % < 3 % 2607 km 85 % < 4 % 2731 km 89 % < 5 % 2817 km 92 %

Table 7: The length of street segments for different incline error classes.

5.2.2.2 By Land Use Classes Like in section 5.1.1, the influence of the land use classes on the accuracy of the GPS incline is investigated. The standard deviation has been calculated for each land use class and is shown in Table 8. In addition, the length in kilometer and the percentage of the street segments within the land use classes is given. The land use class ‘forest’ has the highest standard deviation with σ = 5.6 %, whereas for street segments running through farmland and industrial areas the GPS incline could be calculated with the best accuracy (σ = 2.2 % / 2.3 %). The standard deviation of the other land use classes, are in the middle with σ = 3.0 for ‘allotments’ to σ = 3.8 % in residential areas. The reason may be the different characteristics of the land use classes with regard to the obstruction of the signals from the satellite to the GPS receiver. Like in the previous section, the standard deviation can be improved for all land use classes using only 95 % of the data. This shows that in all land use classes, GPS incline values with errors of high magnitude can be found since the standard deviation decreases in all cases.

69

5 Discussion of Results

length Land Use Class Std Dev. [%] Std Dev. 95 % [%] length [%] [km] overall 4.0 2.3 3064 100 forest 5.4 3.6 1072 35 residential 3.8 2.1 844 28 farmland 2.2 1.1 560 18 grass 3.7 2.0 187 6 allotments 3.0 1.7 102 3 commercial 3.1 1.6 53 2 industrial 2.3 1.7 48 2

Table 8: The achieved accuracy of GPS incline differentiated by land use classes.

The achieved accuracies of the GPS incline differentiated by land use classes reflect the relative accuracy of the GPS track points, evaluated in section 5.1.1.2. Forested areas perform worst, whereas the land use class ‘farmland’ is one of the land use classes with the highest relative accuracy. Differences compared to the relative accuracy of the GPS raw data can be found for ‘grass’ and ‘industrial’. For the land use class ‘grass’, the relative accuracy of the GPS track points is the best, whereas the incline could only be determined with medium accuracy. In industrial areas, the opposite case can be observed. For street segments within industrial areas the GPS incline could be calculated with one of the best accuracies. Contrary, the relative accuracy of the GPS traces within industrial areas is not as good. The reason may be found in the lack of data. Street segments within the land use classes ’grass’, ‘allotments’, ‘commercial’ and industrial’ are only a small part of the entire street network of street segments with calculated GPS incline. This may happen due to missing GPS information in these areas or due to fewer streets. The result would be more reliable and robust if more data would be present.

5.2.2.3 By Terrain Classes (mountainous / flat) Besides the influence of the land use class on the achieved accuracy, it is also evaluated if the terrain affects the accuracy of the GPS incline. It shall be differentiated between street segments in flat and mountainous areas. A street segment is considered as flat if the DTM incline is smaller than 2 %. Streets with an incline of over 5 % are considered as being in a mountainous area. The results are shown in Table 9. The length of the flat street segment sums up to 2018 km which represents the 66 % of all street segments for which the GPS incline could be calculated. The length of streets in mountainous areas accumulates to 603 km, which corresponds to 20 %. Consequently, the missing 14 % are street segments with a DTM incline ranging from 2 % to 5 %. This part of street segments is not considered in this section.

70

5 Discussion of Results

As already stated in sections 5.2.2.1 and 5.2.2.2, the overall standard deviation is σ = 4.0 %. Within flat areas the standard deviation is 2.8 %, whereas in mountainous areas a standard deviation of more than 7 % is calculated. Using only 95 % of the data it can be improved to σ = 1.4 % (flat) and σ = 5.7 % (mountainous). Since the standard deviation of the GPS incline value of street segments within mountainous is worse more than 3 times, it can be said that the incline within flat areas can be determined with higher accuracy than in mountainous areas. However, it has to be noted that the majority (73 %) of the street segments in mountainous areas run through forested areas, whereas only 19 % of streets in flat areas fall in forested areas. Thus, the result may also be influenced by the poor accuracy of incline values within forests (cf. section 5.2.2.2).

Terrain Class Std Dev. [%] Std Dev. 95 % [%] length [km] length [%]

overall 4.0 2.3 3064 100 flat 2.8 1.4 2018 66 mountainous 7.2 5.7 603 20

Table 9: The achieved accuracy of GPS incline differentiated by terrain classes.

5.2.2.4 Effect of Number of GPS Traces on Overall Accuracy In the previous sections, the investigations have been made considering all street segments for which the GPS incline could be calculated and not depending on the number of traces which have been used to calculate the incline. As described in section 0, the GPS incline is calculated out of the elevation differences of the track points of a GPS trace. If a street segment has multiple traces, the incline is calculated individually for each trace. The incline values of the individual traces are then aggregated by calculating the average. It is now evaluated if the accuracy increases with the number of GPS traces per street segment. The diagram in Figure 41 shows the percentage of the street network with an incline error smaller than 2 % in blue, considering the usage of a certain number of GPS traces. 100 % is equivalent to the sum of the lengths of all street segments having at least the number of matched GPS traces. The red line shows the share of the entire street network including also those street segments for which the GPS incline could not be derived.

It was possible to derive the GPS incline with an error smaller than 2 % for 2370 km (77 %) of the street length considering street segments with at least one GPS trace. This is equivalent to 44 % of the total street network. When neglecting street segments with less than 5 GPS traces, 1128 km of the street network are covered. Out of these 1128 km, 87 % of the street length has a GPS incline with an error smaller than 2 %. Compared to the usage of at least 1 trace, the percentage of streets increases, however, the coverage compared to the total street network decreases significantly to less than 20 %. This trend continuous the more traces are used. When considering street segments with at least 30 traces, the percentage of street length with an error smaller than 2 % increases to

71

5 Discussion of Results

98 %, although with only 133 km (corresponding to only 2.5 % of the total street network in the pilot region) there are not many street segments covered with at least 30 GPS traces.

100% 90% 80% 70% share of street 60% length with GPS 50% incline error < 2% [%] 40% share of total 30% street network 20% [%] 10% 0% 1 5 10 15 20 25 30 Minimum Number of Traces

Figure 41: The percentage of streets, with an incline error smaller than 2 % and their share with respect to the entire street network.

5.2.3 Comparison GPS incline and SRTM incline The derived GPS incline using the approach presented in this thesis is now compared to the SRTM- 1 DSM, as this is an alternative data source for deriving incline information. As described previously in section 4.3.4 the DSM is freely available with a horizontal resolution of approximate- ly 1 arcsecond, which is equivalent to approximately 30 m at the equator.

72

5 Discussion of Results

The SRTM incline of the street network was calculated as described in section 4.5 and the standard deviation has been calculated like it was done for the GPS incline. This gives a comparable measure which is suitable to indicate how the accuracy of the GPS incline performs in comparison to the SRTM incline. Table 10 gives an overview of how much of the street network, the incline can be determined with an error smaller than 2 % considering both data sources (GPS and SRTM). The numbers for the GPS incline are taken from the diagram in section 5.2.2.4.

Percentage of street network with coverage incline error smaller < 2 %

SRTM 73 % 100 %

GPS, ≥ 1 trace 77 % 44 %

GPS, ≥ 5 traces 86 % 18 %

Table 10: Comparison of SRTM and GPS incline in terms of amount of street network with an incline error smaller than 2 %.

Using SRTM, it is possible to derive the incline with an error smaller than 2 % for 2236 km out of 3064 km (73 %) of the street network. When using GPS traces, this depends on the minimum number of GPS traces which are used for the determination of incline. Considering street segments with at least 1 trace, 77 % percentage of the street network can be determined with an incline error smaller than 2 %. When neglecting streets with less than 5 GPS traces, the percentage increases to 86 %. However, this requires the coverage of enough GPS traces, which is the case in only 18 % of the entire street network in the pilot region. To summarize, it can be said that GPS incline performs slightly better than the SRTM incline, however, the coverage is more complete with SRTM. But it has to be noted as well, that GPS traces can always be collected by volunteers, thus the coverage may get higher.

Besides the length of the street of which the incline was derived within a certain error range, the standard deviation is compared in the following sections. Firstly, the standard deviations of the GPS and SRTM incline are differentiated by land use class and secondly by terrain classes.

5.2.3.1 By Land Use Classes Table 11 shows the comparison of the standard deviations by land use classes. The standard deviations have been calculated using 95 % of the data. Besides the standard deviation of the

SRTM incline σSRTM, the one of the GPS incline σGPS (cf. section 5.2.2.2) and their difference is shown. Additionally, the standard deviation has been calculated from the error values of the

GPS incline, considering only those street segments, which have at least 5 GPS traces (σGPS 5T). The difference between σSRTM and σGPS 5T is given in the last column. Overall, σSRTM is with 3.1 %, 73

5 Discussion of Results

0.8 % larger than σGPS and even 1.5 % larger than σGPS 5T. That means that the GPS incline can be derived with less uncertainty, especially when neglecting street segments with less than 5 GPS traces. This holds true within all land use classes. Considering the GPS incline, derived from at least 5 traces, the difference of the standard deviations is for all land use classes larger than 1 %, reaching almost 2 % in the land use class ‘grass’.

Land Use σSRTM σGPS σGPS - σSRTM σGPS 5T σGPS 5T - σSRTM Class [%] [%] [%] [%] [%] overall 3.1 2.3 -0.8 1.6 -1.5 forest 4.2 3.6 -0.6 3.1 -1.1 farmland 2.0 1.1 -0.9 0.9 -1.1 residential 2.8 2.1 -0.7 1.5 -1.3 commercial 2.9 1.6 -1.3 1.5 -1.4 allotments 2.6 1.7 -0.9 1.3 -1.3 grass 3.3 2.0 -1.3 1.5 -1.8 industrial 2.6 1.7 -0.9 1.0 -1.6 Table 11: Comparison of the standard deviations of the incline error, overall and differentiated by land use classes.

The SRTM incline may perform worse because of several reasons. Firstly, the SRTM-1 DEM is a DSM which means that all structures on the earth surface are not reduced from the elevation information. This normally does not matter, since streets, apart from those in forests or under bridges, are hardly covered by trees or man-made structures. But due to the low horizontal resolution of 30 m this becomes a problem. If one square (or pixel) of the DSM covers not only the street, but also building and trees which are right next to the streets, the value of this square is an average elevation of the ground and the other structure. Contrary to SRTM, the GPS track points are recorded on the earth surface or to be more precise on a constant height above the ground (in the car or in the back pack). In addition to this problem, the SRTM data suffers from a vertical accuracy of 6.2 m, which is a relatively large error in comparison to the relative accuracy of GPS with 0.6 m within a distance of 30 m (cf. section 5.1.1.2).

74

5 Discussion of Results

5.2.3.2 By Terrain Classes The standard deviations are now differentiated between flat areas (DTM incline < 2 %) and mountainous areas (DTM incline > 5 %). As shown in Table 12, the SRTM incline performs better in flat areas (σSRTM=2.5 %) than in mountainous areas (σSRTM=5.1 %). The GPS incline performs better as the SRTM incline in flat areas, whereas in mountainous areas the SRTM incline is slightly better. This is surprising since the GPS incline could be derived more accurately overall, within all land use classes as well as in flat areas. The reason why the SRTM incline is slightly better may not be because the SRTM incline could actually be determined more accurately, but the GPS incline performs in mountainous regions extraordinary badly.

σ σ σ - σ σ - σ SRTM GPS GPS SRTM σ [%] GPS 5T SRTM Terrain [%] [%] [%] GPS 5T [%] overall 3.1 2.3 -0.8 1.6 -1.5 flat 2.5 1.4 -1.1 1.0 -1.5 mountainous 5.1 5.7 0.6 5.4 0.3 Table 12: Comparison of the standard deviations of the incline, overall and differentiated by terrain classes.

5.3 Limitations of Approach

As describes in section 0, the approach of deriving incline information from user-generated GPS traces results in an incline with a reasonable accuracy, however, due to the methodology there are also limitations. In this section, these limitations will be discussed critically.

In the OpenStreetMap Wiki (2015e), a convention regarding the incline of streets is given. When adding incline information to OSM-Ways, the street segment shall be split at the beginning and at the end of the inclined part. The value which is then added to the key ‘incline’, should represent the maximum value which can be found within this part of street rather than the average incline. But using the approach of this thesis, the average incline is calculated per street segment. In the preprocessing, the street segments were split at their intersection points. However, those parts of the streets in which there is an incline are not detected. Thus, the geometry objects cannot be split at the beginning and at the end of the inclined part of the street.

The calculation of the average incline per street segment and that the steepest parts are not detected, does not result in a problem as long as the street segment contains a constant incline over the length of the street segment. However, in reality there are situations in which this approach leads to wrong results. Figure 42 shows two examples, (a) and (b). In (a) the street segment contains 3 parts with different inclines. Two of them are flat, whereas the one in the middle is

75

5 Discussion of Results inclined. The average incline, calculated for this street segment, results in a value which is lower than the incline in reality. This leads to a problem if a person expects for example a 5 % incline along a distance of 100 m distance and faces in reality a 10 % incline within a distance of 50 m. At least, in this case it is known that there is an incline. The example in Figure 42(b) shows a situation in which the average incline results in 0 %, since the street segments contains two inclined parts with the same magnitude but in the opposite direction.

(a) (b)

Figure 42: Situations where the calculated incline differs from the steepest incline.56

This problem was not addressed in this thesis as the main focus was on the examination of user- generated GPS traces with regard to their feasibility of deriving incline information as well as the development of a method and tools which handle the GPS data and process them to derive incline information.

56 Bicycle pictogram: © Pixabay-User: ‘ClkerFreeVectorImages‘, Source of image: https://pixabay.com/de/fahrrad-piktogramm-sport-307977/, checked on 20/07/2015 76

6 Conclusion and Outlook

6 Conclusion and Outlook

6.1 Conclusion

Different user-groups may benefit from routing planning which considers the incline of a street network. There are for example mobility-restricted people, such as wheelchair users, people with walking aids or even parents with push chairs, for whom streets or paths may be inaccessible if there is an incline of certain magnitude. Knowing the incline in advance, a route can be planned with avoiding steep streets. The chosen route may be longer, but not as steep as the shortest one. Furthermore, it is useful for route planning of electricity-powered vehicles or bicycles.

The data of the OpenStreetMap project, which is a freely available source of street network data and often used by routing engines, does only provide incline information for 0.2 % of the street network. Therefore, the automatic derivation of incline values may fill the gap. One source of elevation information to derive incline information for a street network may be digital elevation models (DEMs). There are DEMs acquired from LiDAR-measurements. These are very accurate, however, there are usually also very expensive and not globally available. Alternatively, low-cost DEMs like SRTM-1 DEM or ASTER are freely and (almost) globally available but are limited through their horizontal resolution of 30 m and vertical accuracy of 9 m (ASTER GDEM) and 6 m (SRTM-1 DEM). Another source of elevation data, which is freely available and at least theoreti- cally globally available, are user-generated GPS traces of the OpenStreetMap project. Initially collected for the purpose of map making, the data might also serve other purposes. Contrary to SRTM-1 and ASTER and depending on the coverage, many GPS track points may fall within a square of 30 m, which is the horizontal resolution of SRTM-1 DEM and ASTER DGEM. There- fore, there is more information about the elevation which potentially results in incline values of higher accuracy, although the absolute vertical accuracy of GPS is known to be fairly poor. But rather than the absolute elevation, only the difference in elevation of two adjacent points is of relevance. The relative accuracy is assumed to be better than the absolute one.

77

6 Conclusion and Outlook

The following aims for this thesis have been formulated:

- Creation and implementation of a workflow to calculate the incline of streets, using user- contributed GPS traces. - Assessment of the quality of voluntary collected GPS traces in terms of o vertical accuracy (absolute and relative) o coverage of GPS traces - Assessment of the achieved quality of the incline information, compared to LiDAR and SRTM-1 DEM. - Publication of developed software as Open Source and provision to the OpenStreetMap community

The steps to fulfill the aforementioned aims will be discussed in the following.

Before calculating the GPS incline for the segments of the street network, different steps have to be undertaken to prepare the two main input data sets, the GPS traces and the street network. The GPS traces are downloaded from the OpenStreetMap (OSM) project and include over 4000 traces in the pilot region (Heidelberg Area / Germany). Not all of the traces have the optional elevation information, therefore, only 3842 traces with over 2 million GPS track points remain to derive the incline. To import the GPS traces which can be downloaded as compressed file-archive (*.tar.xz), a Java-Tool has been developed. It reads the file-archive and filters the GPS traces by bounding box, rejects all traces without elevation information and stores the traces in the database. Since, the elevation information of the GPS traces suffer from noise and other irregularities, they have to be preprocessed in the following step. The street network which is going to be enhanced with incline information, has also been taken from OSM. The data has been imported to the database using the Java-tool Osmosis. For the pilot region the street network has a total length of 5338 km, containing different types of streets. It includes for example residential streets (18 %) and motorways (2 %) but also paths which are exclusively dedicated to pedestrians or cyclists (together 20 %). Like the GPS traces, also the street network has been preprocessed. The streets have been split at their intersection points with other streets to avoid long streets which may span several valleys and hills.

It is considered that the GPS traces were recorded while traveling on a street, which is important for the next step. For the incline calculation, the assignment of the GPS traces to the street segments (map matching) is an essential step. The assumption has been made, that streets which are parallel to each other (e.g. street with two separate lanes, footpath next to street) also have the same incline. Consequently, GPS traces which were recorded on one of the parallel streets can also be used for the incline calculation of the other one. This increases the number of traces per street, however, this assumption may also lead to errors if two parallel streets have different inclines.

78

6 Conclusion and Outlook

The GPS incline calculation was done for each street segment individually. First of all, the previously assigned GPS traces of the street are selected. A buffer of the street segments with a size of 15 m is then used to clip the selected traces. Only those parts of the traces which fall into the buffer shall be used to calculate the incline. After that, the incline for each GPS trace is calculated by averaging all inclines derived from every two adjacent GPS track points. If there are multiple traces per street segment, the procedure is repeated for the other ones as well. At the end, the incline values of all traces are averaged to the final incline of the street segment.

The second aim of this thesis is the evaluation of the GPS raw data with regard to the absolute and relative accuracy considering a LiDAR DTM as reference. Overall, the RMSE (using 90 % of the data) of the GPS-elevation is 27 m and depending on the land use class it ranges from 21 m for GPS track points in farmland and 35 m in the land use class ‘allotments’. This is worse than stated in the literature in which the accuracy of low-cost GPS receivers has been assessed. It may happen, that smartphone apps use elevation databases which rely on DEMs, such as SRTM-1. Furthermore, some handheld devices have a barometric measurement unit. Furthermore, the GPS elevation refers in many cases to the mean sea level, although it should be given as the height over the WGS 84 ellipsoid. Only some GPS track points are referred to the ellipsoid. This means that many smartphone applications or handheld GPS devices internally transform the ellipsoidal to geoidal height with the help of a geoid model.

To judge the relative accuracy, the RMSE of the elevation differences between two adjacent GPS track points has been calculated. Overall, the RMSE is 0.3 m, however, depending on the land use class the RMSE ranges from 0.2 m to 0.5 m. Land use classes which are characterized by mainly fields and almost no buildings like ‘farmland’ or ‘allotments’ perform with an RMSE of 0.2 m better than others which are characterized by tall buildings or a dense tree canopy such as residen- tial areas or forests (RMSE = 0.3 m / 0.5 m). Combining the RMSE with the average distance between the points, it results in an incline error of 2.4 % overall, 1.0 % for farmland and 4.3 % for forested areas. With an incline accuracy of 2.4 % it is possible to derive incline out of GPS traces with a reasonable accuracy, especially considering that street often are covered with traces.

Besides the absolute and relative accuracy, the coverage of GPS traces and density of GPS track points has been evaluated. The coverage was investigated by street type. When considering all street types which are used by cars, it can be said, that in the pilot region streets of higher priority also have a higher coverage. With at least one GPS trace, 100 % of the motorways, primary and secondary streets are covered, while residential streets are only covered with 60 %. Considering the coverage of at least 5 GPS traces, still almost 100 % of the motorways and primary streets, 82 % of the secondary and only 14 % of the residential streets are covered. The street types ‘path’ and ‘footway’, which are used by pedestrians and also mobility-restricted people are comparably 79

6 Conclusion and Outlook covered than residential streets. Cycle ways even have a better coverage, which can be compared to secondary streets. This shows that the contributors of OSM are not only traveling by car, there are also many paths covered which are dedicated to pedestrians. This is of benefit, considering the motivation of this thesis of calculating the incline for mobility-restricted people.

The overall distance between two adjacent points of the GPS traces is 14 m. Compared to the horizontal resolution of SRTM-1 DEM, this is as twice as high and 2 GPS track points theoretically fall into one pixel of the DEM. This results in a higher horizontal resolution, especially if a street is covered by multiple. The distance between two GPS track points depends on the average speed on the street type. For example is the average distance on motorways 36 m, while on foot ways the average distance is only 9 m. It implies, that most devices record the GPS track points in a time- dependent interval.

The third aim of this thesis is to validate the result, using incline derived from the high-accuracy DTM as reference and the SRTM-1 DSM. The incline was calculated for 3064 km street length which is equivalent to 57 % of the entire street network within the pilot region. Out of this street length, 61 % have an incline error smaller than 1 %, which is probably a sufficient accuracy for most use-cases. For even 85 % (2607 km) of the street network, the incline could be calculated with an error below 3 %, which still may reasonable for some use-cases. The normal distributed incline error has a standard deviation of σ = 2.3 % (with 95 % of the data), but depending on the land use classes σ is ranging from σ = 1.1 % (farmland) to σ = 3.6 % (forest). It is noticeable that the incline is more accurate within land use classes which do not suffer from obstructions of the satellite signal. If differentiating the incline accuracy by terrain classes, it was discovered that the incline can be determined with higher accuracy in flat areas (σ = 1.4 %), whereas in mountainous areas the accuracy is worse with σ = 5.7 %. However, the main part of the mountainous area is characterized by forests, which also performs worse than other land use classes.

The accuracy can generally be improved with only considering street segments which are covered by multiple traces. For example, if all street segments with a GPS incline are considered, 77 % of the inclines are determined with an accuracy better than 2 %. With an increasing minimum number of traces, the percentage of streets with an incline error below 2 % increases to 87 % (>5 traces) resp. 92 % (>10 traces). However, the coverage gets significantly worse.

The GPS incline was compared to the incline derived from the SRTM-1 DSM to see how user- generated GPS traces perform in comparison to other freely available data. The evaluation has shown that the GPS incline performs slightly better than the SRTM incline. Using SRTM-1, the incline could be determined with an error smaller than 2 % for 73 % of the street network. With GPS traces this increases to 77 %, considering all street segments with at least 1 GPS trace and to

80

6 Conclusion and Outlook

86 % of the street with at least 5 GPS traces. However, the coverage of GPS cannot keep up with the SRTM-1 DEM which is almost globally available. The percentage of streets with an GPS incline smaller than 2 % is only 44 % (> 1 GPS trace) resp. 18 % (> 5 GPS traces).

To conclude it can be said, that it is possible to derive incline values of a street network in a reasonable accuracy, if the streets are covered with multiple GPS traces. Especially in comparison to SRTM-1 DSM, the GPS incline performs better, although the coverage is significantly lower. With introducing other sources of user-generated GPS traces, the coverage can be improved. The result shows, that it is nowadays possible to achieve a comparable or even slightly better results with user-generated data, compared to data collected by a research satellite. However, user- generated GPS traces also require satellites, but the data was not primarily collected for the purpose of incline calculation.

6.2 Outlook

This approach has been tested in the area around Heidelberg. The advantage of this area is that there is a diversity of land use classes as well as flat and mountainous areas. However, the mountainous area is mainly covered by forests and the residential areas are often in flat terrain. To do further tests regarding the dependency of mountainous and flat areas on the accuracy of GPS incline, the approach could be applied to other pilot regions, for example Zürich in Switzer- land, where a high-accuracy DTM is available as open data.

Furthermore, it can be tried in the future to improve the results of this approach. One way to achieve better results is the introduction of other sources on top of OpenStreetMap. The data of sport tracking platforms and projects such as Strava or gpsies.com could be combined with the GPS traces from OSM. Unfortunately, some projects do not offer public access to the GPS traces, but with requesting the data for a specific reason or setting up a cooperation, it might be possible to get anonymized data. The larger amount of data would result in a higher coverage of GPS traces which leads to a higher completeness of streets with GPS incline and to higher accuracies, since there will be more streets with multiple corresponding traces. If there is a limited and relatively small area of interest, it is also possible to collect the data only for this reason by volunteers. Furthermore, a smartphone can be given to people who regularly drive or walk through the area (e.g. couriers, taxi), to record their location the entire work day.

An additional field of further research would be to compute a digital terrain model out of user- generated GPS traces. Massad & Dalyot (2015) already did investigations towards the generation of a DTM, using GPS traces recorded from a smartphone. They tested their approach, which includes a 2D Kalman filtering, on a university campus with data collected exclusively for this

81

6 Conclusion and Outlook purpose and under good conditions. The measured GPS track points are relatively equally distributed, since not only points on paths have been measured, but also on lawn and parking spaces. The approach of Massad & Dalyot (2015) could be tested using the GPS data of OSM. It offers a large amount of data, however, it also involves some challenges. In case of OSM GPS traces, it is not known, which devices were used, to which vertical datum the elevation is referred to and the traces are mainly recorded on streets or paths. The latter would lead to a gap of data in the areas between the streets. Furthermore, streets often have multiple traces what leads to a high density of points which may differ a lot in elevation due to their poor absolute accuracy. Because of the aforementioned challenges it would be interesting to find out if the OSM GPS traces are suitable for deriving a digital terrain model.

82

6 Conclusion and Outlook

7 Bibliography Bachofer, F. (2011): Einfluss der vertikalen Genauigkeit von DGM aus das EcoRouting von Elektrofahrzeugen. In J. Strobl, T. Blaschke, G. Griesebner (Eds.): Angewandte Geoin- formatik 2011. Beiträge zum 23. AGIT-Symposium Salzburg. Berlin, Offenbach: Wich- mann, pp. 338–346.

Bauer, C. A. (2010): User Generated Content – Urheberrechtliche Zulässigkeit nutzergenerierter Medieninhalte. In H. Große Ruse-Khan, N. Klass, S. von Lewinski (Eds.): Nutzergenerierte Inhalte als Gegenstand des Privatrechts, vol. 15. Berlin, Heidelberg: Springer, pp. 1–42.

Boucher, C. (2013): Fusion of GPS, OSM and DEM Data for Estimating Road Network Elevation. In : Fifth International Conference on Computational Intelligence, Communication Systems and Networks (CICSyN). Madrid, Spain, pp. 273–278.

Cartwright, W.; Gartner, G.; Meng, L.; Peterson, M. P.; Peckham, R. J.; Jordan, G. (2007): Digital Terrain Modelling. Berlin, Heidelberg: Springer.

Conley, R.; Cosentino, R.; Hegarty, C. J.; Kaplan, E. D.; Leva, J. L.; Uijt de Haag, M.; Van Dyke, K. (2006): Performance on Stand-Alone GPS. In E. D. Kaplan, C. Hegarty (Eds.): Under- standing GPS. Principles and applications. 2nd edition. Boston: Artech House, pp. 301–378.

Cosentino, R. J.; Diggle, D. W.; Uijt de Haag, M.; Hegarty, C. J.; Milbert, D.; Nagle, J. (2006): Differential GPS. In E. D. Kaplan, C. Hegarty (Eds.): Understanding GPS. Principles and applications. 2nd edition. Boston: Artech House, pp. 379–458.

Czegka, W.; Braune, S.; Behrends, K. (2004): Die Qualität der SRTM-90m Höhendaten und ihre Verwendbarkeit in GIS. 24. Wissenschaftlich-Technische Tagung der DGPF. Halle, 2004.

Ding, D.; Parmanto, B.; Karimi, H. A.; Roongpiboonsopit, D.; Pramana, G.; Conahan, T.; Kasemsuppakorn, P. (2007): Design considerations for a personalized wheelchair navigation system. In Conference proceedings: Annual International Conference of the IEEE Engineer- ing in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference 2007, pp. 4790–4793. empirica Gesellschaft für Kommunikations- und Technologieforschung mbH (2015): Welcome to cap4access. Available online at http://cap4access.eu/intro/, checked on 1/15/2015.

European Space Agency (2015): What is Galileo? Available online at http://www.esa.int/Our_Activities/Navigation/The_future_-_Galileo/What_is_Galileo, checked on 4/23/2015.

83

6 Conclusion and Outlook

Farr, T. G.; Rosen, P. A.; Caro, E.; Crippen, R.; Duren, R.; Hensley, S. et al. (2007): The Shuttle Radar Topography Mission. In Reviews of Geophysics 45 (2). DOI: 10.1029/2005RG000183.

Fayyad, U.; Piatetsky-Shapiro, G.; Smyth, P. (1996): From Data Mining to Knowledge Discovery in Databases. In U. M. Fayyad (Ed.): Advances in knowledge discovery and data mining. Menlo Park: AAAI Press, pp. 1–34.

Feairheller, S.; Clark, R. (2006): Other Systems. In E. D. Kaplan, C. Hegarty (Eds.): Understanding GPS. Principles and applications. 2nd edition. Boston: Artech House, pp. 595–634.

Franke, D.; Dzafic, D.; Baumeister, D.; Kowalewski, S. (2012): Energieeffizientes Routing für Elektrorollstühle. In : 13. Aachener Kolloquium Mobilität und Stadt (AMUS/ACMOTE): RWTH Aachen, pp. 65–68. Available online at http://publications.embedded.rwth- aachen.de/file/51, checked on 7/21/2015.

Goodchild, M. F. (2007): Citizens as sensors: the world of volunteered geography. In GeoJournal 69 (4), pp. 211–221. DOI: 10.1007/s10708-007-9111-y.

Hahmann, S. (2014): Zur Beziehung von Raum und Inhalt nutzergenerierter geographischer Informationen. Dissertation. Technische Universität Dresden, Dresden. Institut für Kar- tographie.

Haining, R. P. (2003): Spatial data analysis. Theory and practice. Cambridge, UK, : Cambridge University Press.

Haklay, M.; Weber, P. (2008): OpenStreetMap. User-Generated Street Maps. In IEEE Pervasive Computing 7 (4), pp. 12–18. DOI: 10.1109/MPRV.2008.80.

Han, J.; Kamber, M. (2006): Data mining. Concepts and techniques. 2nd ed. , Boston, San Francisco, CA: Elsevier; Morgan Kaufmann (The Morgan Kaufmann series in data management systems).

Han, S.; Rizos, C. (1999): Road Slope Information from GPS-Derived Trajectory Data. In Journal of Surveying Engineering 125 (2), pp. 59–68.

Harriehausen-Mühlbauer, B. (2014): Mobile Navigation for Limited Mobility Users. In D. Hutchison, T. Kanade, J. Kittler, J. M. Kleinberg, A. Kobsa, F. Mattern et al. (Eds.): Digital Human Modeling. Applications in Health, Safety, Ergonomics and Risk Management, vol. 8529. Cham: Springer International Publishing (Lecture Notes in Computer Science), pp. 535–545.

84

6 Conclusion and Outlook

Heipke, C. (2010): Crowdsourcing geospatial data. In ISPRS Journal of Photogrammetry and Remote Sensing 65 (6), pp. 550–557. DOI: 10.1016/j.isprsjprs.2010.06.005.

Hofmann-Wellenhof, B.; Lichtenegger, H.; Wasle, E. (2008): GNSS - Global Navigation Satellite Systems. GPS, GLONASS, Galileo, and more. Wien, New York: Springer.

Jokar Arsanjani, J. (2014): Case study I: VGI platforms and data generalization. In D. Burghardt, C. Duchêne, W. Mackaness (Eds.): Abstracting Geographic Information in a Data Rich World. Cham: Springer International Publishing (Lecture Notes in Geoinformation and Car- tography), pp. 131–138.

Karussel (2014): Digitalizing GPX Points or How to Track Vehicles With GraphHopper. Available online at https://karussell.wordpress.com/2014/07/28/digitalizing-gpx-points-or-how-to- track-vehicles-with-graphhopper/, updated on 7/28/2014, checked on 5/8/2015.

Kono, T.; Fushiki, T.; Asada, K.; Nakano, K. (2008): Fuel Consumption Analysis and Prediction Model for “Eco” Route Search. In : 15th World Congress on Intelligent Transport Systems and ITS America's 2008 Annual Meeting.

Kurihara, M.; Nonaka, H.; Yoshikawa, T. (2004): Use of highly accurate GPS in network-based barrier free street map creation system. In : IEEE International Conference on Systems, Man and Cybernetics. The Hague, Netherlands, Oct. 10-13, 2004, pp. 1169–1173.

Langley, R. B. (1999): Dilution of Precision. In GPS World (10 (5)), pp. 52–59.

Liu, G.; Hossain, K. M. A.; Iwai, M.; Ito, M.; Tobe, Y.; Sezaki, K.; Matekenya, D. (2014): Beyond horizontal location context: Measuring Elevation Using Smartphone’s Barometer. In : Pro- ceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication. New York, USA, pp. 459–468.

Marchal, F.; Hackney, J.; Axhausen, K. (2005): Efficient Map Matching of Large Global Position- ing System Data Sets: Tests on Speed-Monitoring Experiment in Zürich. In Transportation Research Record 1935 (1), pp. 93–100. DOI: 10.3141/1935-11.

Massad, I.; Dalyot, S. (2015): Towards the production of digital terrain models from volunteered GPS trajectories. In Survey Review. DOI: 10.1179/1752270615Y.0000000010.

Menkens, C.; Sussmann, J.; Al-Ali, M.; Breitsameter, E.; Frtunik, J.; Nendel, T.; Schneiderbauer, T. (2011): EasyWheel - A Mobile Social Navigation and Support System for Wheelchair Users. In : Eighth International Conference on Information Technology: New Generations (ITNG). Las Vegas, NV, USA, pp. 859–866.

85

6 Conclusion and Outlook

Mennis, J.; Guo, D. (2009): Spatial data mining and geographic knowledge discovery—An introduction. In Computers, Environment and Urban Systems 33 (6), pp. 403–408. DOI: 10.1016/j.compenvurbsys.2009.11.001.

Meyer, D. J. (2011): ASTER Global Digital Elevation Model Version 2 – Summary of Validation Results. Available online at https://www.jspacesystems.or.jp/ersdac/GDEM/ver2Validation/Summary_GDEM2_validati on_report_final.pdf, checked on 1/15/2015.

Müller, A.; Neis, P.; Auer, M.; Zipf, A. (2010): Ein Routenplaner für Rollstuhlfahrer auf der Basis von OpenStreetMap-Daten - Konzeption, Realisierung und Perspektiven. In J. Strobl, T. Blaschke, G. Griesebner (Eds.): Angewandte Geoinformatik 2010. Beiträge zum 22. AGIT- Symposium Salzburg. Berlin, Offenbach: Wichmann.

Neis, P.; Zielstra, D. (2014a): Generation of a tailored routing network for disabled people based on collaboratively collected geodata. In Applied Geography 47, pp. 70–77. DOI: 10.1016/j.apgeog.2013.12.004.

Neis, P.; Zielstra, D. (2014b): Recent Developments and Future Trends in Volunteered Geographic Information Research. The Case of OpenStreetMap. In Future Internet 6 (1), pp. 76–106. DOI: 10.3390/fi6010076.

Neis, P.; Zielstra, D.; Zipf, A. (2012): The Street Network Evolution of Crowdsourced Maps. OpenStreetMap in Germany 2007–2011. In Future Internet 4 (4), pp. 1–21. DOI: 10.3390/fi4010001.

Open Knowledge Foundation (2015): ODC Open Database License (ODbL) Summary. Available online at http://opendatacommons.org/licenses/odbl/summary/, checked on 4/20/2015.

OpenStreetMap Foundation Wiki (2015a): About. Available online at http://wiki.osmfoundation.org/w/index.php?title=About&oldid=3201, updated on 4/1/2015, checked on 4/20/2015.

OpenStreetMap Foundation Wiki (2015b): License/We Are Changing The License. Available online at http://wiki.osmfoundation.org/w/index.php?title=License/We_Are_Changing_The_License &oldid=1813, updated on 4/1/2015, checked on 4/20/2015.

OpenStreetMap Foundation Wiki (2015c): Working Groups. Available online at http://wiki.osmfoundation.org/w/index.php?title=Working_Groups&oldid=2220, updated on 4/1/2015, checked on 4/20/2015.

86

6 Conclusion and Outlook

OpenStreetMap Wiki (2015a): Bing. Available online at http://wiki.openstreetmap.org/w/index.php?title=Bing&oldid=1117458, updated on 4/16/2015, checked on 4/18/2015.

OpenStreetMap Wiki (2015b): Map Features. Available online at http://wiki.openstreetmap.org/w/index.php?title=Map_Features&oldid=1178564, checked on 5/27/2015.

OpenStreetMap Wiki (2015c): Stats. Available online at http://wiki.openstreetmap.org/w/index.php?title=Stats&oldid=1145799, updated on 4/9/2015, checked on 4/15/2015.

OpenStreetMap Wiki (2015d): User:Ikonor/DE:SRTM Alternativen / DGM – OpenStreetMap Wiki. Edited by OpenStreetMap Wiki. Available online at http://wiki.openstreetmap.org/w/index.php?title=User:Ikonor/DE:SRTM_Alternativen_/_DG M&oldid=1160583, checked on 7/11/2015.

OpenStreetMap Wiki (2015e): Key:incline. Available online at http://wiki.openstreetmap.org/w/index.php?title=Key:incline&oldid=1148320, checked on 5/27/2015.

Quddus, M. A.; Ochieng, W. Y.; Noland, R. B. (2007): Current map-matching algorithms for transport applications: State-of-the art and future research directions. In Transportation Re- search Part C: Emerging Technologies 15 (5), pp. 312–328. DOI: 10.1016/j.trc.2007.05.002.

Ramm, F.; Topf, J. (2010): OpenStreetMap. Die freie Weltkarte nutzen und mitgestalten. 3. Auflage. Berlin: Lehmanns Media.

Ramm, F.; Topf, J.; Chilton, S. (2011): OpenstreetMap. Using and enhancing the free map of the world. English ed. Cambridge, : UIT Cambridge.

Resch, B. (2013): People as Sensors and Collective Sensing - Contextual Observations Comple- menting -Sensor Network Measurements. In J. M. Krisp (Ed.): Progress in location- based services. Heidelberg, New York: Springer (Lecture Notes in Geoinformation and Car- tography), pp. 391–406.

Sachenbacher, M.; Leucker, M.; Artmeier, A.; Haselmayr, J. (2011): Efficient Energy-Optimal Routing for Electric Vehicles. In : Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence and the Twenty-Third Innovative Applications of Artificial Intelli- gence Conference, 7-11 August 2011, San Francisco, , USA. Menlo Park, Calif.: AAAI Press, pp. 1402–1407.

87

6 Conclusion and Outlook

Santerre, R.; Pan, L.; Cai, C.; Zhu, J. (2014): Single Point Positioning Using GPS, GLONASS and BeiDou Satellites. In Positioning 05 (04), pp. 107–114. DOI: 10.4236/pos.2014.54013.

Sester, M.; Jokar Arsanjani, J.; Klammer, R.; Burghardt, D.; Haunert, J.-H. (2014): Integrating and Generalising Volunteered Geographic Information. In D. Burghardt, C. Duchêne, W. Mackaness (Eds.): Abstracting Geographic Information in a Data Rich World. Cham: Springer International Publishing (Lecture Notes in Geoinformation and ), pp. 119–155.

Shekhar, S.; Zhang, P.; Huang, Y.; Vatsavai, R. R. (2004): Trends in Spatial Data Mining. In H. Kargupta (Ed.): Data mining. Next generation challenges and future directions. Menlo Park, Calif., London, Cambridge, Mass.: AAAI Press; Copublished and distributed by MIT Press.

Sui, D. Z. (2008): The wikification of GIS and its consequences: Or Angelina Jolie’s new tattoo and the future of GIS. In Computers, Environment and Urban Systems 32 (1), pp. 1–5. DOI: 10.1016/j.compenvurbsys.2007.12.001.

Torge, W. (2001): Geodesy. 3rd completely rev. and extended ed. Berlin, New York: W. de Gruyter.

Tukey, J. W. (1977): Exploratory data analysis. Reading, Mass.: Addison-Wesley Pub. Co (Addison-Wesley series in behavioral science). van Winden, K. (2014): Automatically Deriving and Updating Attribute Road Data from Move- ment Trajectories. Master's Thesis. Delft University of Technology.

Völkel, T.; Weber, G. (2008): RouteCheckr. In S. Harper, A. Barreto (Eds.): the 10th international ACM SIGACCESS conference. Halifax, Nova Scotia, Canada, p. 185.

Zhang, L.; Thiemann, F.; Sester, M. (2010): Integration of GPS traces with . In : Computational Transportation Science, pp. 17–22.

Zhilin, L.; Qing, Z.; Gold, C. (2005): Digital terrain modeling Principles and methodology. New York, USA: CRC-Press.

88