The Development of a Predictive Hiking Travel Time Model Accounting for Terrain Variations

102

Peter WITT

Abstract

More and more GPS collected travel data is being shared via the internet. This is especially true with outdoor adventure travel including hiking. This research seeks to ascertain if this data can be used in the development of a predictive hiking speed model accounting for terrain variations. Fifty-two hikes represented by global positioning system (GPS) data were downloaded from an outdoor adventure travel web site to be used in the study. After identifying gross data errors several hikes were eliminated leaving 40 hikes which were used in the development of a hiking model. Regression analysis was carried out using slope and speed as the independent and dependent variables respectively. Two exponential curves were identified to model the slope – speed relationships in the GPS data, one for up slope travel and one for down slope travel. The study model developed and other speed slope models were loosely compared to observed travel times in nine additional GPS collected hikes. Spatial scale and spatial dependency in the GPS dataset was also examined in the process of the study. A geographically weighted regression (GWR) was performed on the GPS data of a single hike. Results indicated a better fit than classical regression analysis. A spatial autocorrelation analysis was performed on the residual values of the GWR. The results of the spatial autocorrelation analysis on the residuals of the GWR model on one hike showed that they are spatially dependent, which implied that other factors such as GPS accuracy impact the slope-speed model. The study shows that with some additional research and changes to data scrubbing routines used; there is a possibility for developing a better more robust hiking slope – speed model using this type of GPS data.

1 Introduction

This study explores the potential of using hiking data collected via hand held Global Positioning Systems (GPS) as a basis for developing a travel time model that considers terrain variations. Over the last seven to ten years more GPS collected hiking information is made available to the public. These data are being loaded onto a number of websites that are used by outdoor enthusiasts that are interested in planning a hike of their own. Some of these websites have amassed tens of thousands of GPS datasets that represent hikers’ travels. This predictive travel time model could be used in adventure travel planning, wild land accessibility evaluation, and emergency response evacuation planning to name a few.

Jekel, T., Car, A., Strobl, J. & Griesebner, G. (Eds.) (2012): GI_Forum 2012: Geovizualisation, Society and Learning. © Herbert Wichmann Verlag, VDE VERLAG GMBH, Berlin/Offenbach. ISBN 978-3-87907-521-8. The Development of a Predictive Hiking Time Travel Model 103

There are/have been several efforts to develop travel time models for overland travel. Naismith’s (NAISMITH 1892) rule, “an hour for every three miles on the map with an additional hour for every 2000 ft of ascent”, has been and still is widely used to model travel over hilly terrain. Several corrections including Langmuir’s (LANGMUIR 1984) have also been suggested to account for terrain variations and other factors that impact travel speed. Tobler’s (TOBLER 1993) slope – speed model is also being used to determine travel time (see Fig. 1). These travel models are being used typically as part of a larger study or analysis.

Fig. 1: Hiking speed as a function of slope (TOBLER 1993)

Two studies (WING et al. 2005, WING 2006) which examined the positional and elevation accuracies of hand held GPS devices are representative of the GPS data and units used in this study. First, the hand-held GPS units tested in the Wing studies are similar and in many cases the same as the GPS units that are used by the hikers who have collected and submitted their tracks. In addition, the Wing studies tested the GPS units in three different landscape classes, in most cases these classes are similar to what would be expected in the trails and terrain covered by the GPS hiking data. Results from the tests showed that four of the six units had similar accuracies with the two remaining a little less accurate. The average horizontal errors of the four units in the three settings were: open sky 3.2 meters; young forest 5 meters; closed canopy 10 meters. For elevation accuracies, the closed canopy setting provided the worst accuracy at 9 meters and the open sky and young forest settings at 4 and 5 meters respectively.

2 Data and Analysis

2.1 Study Data

The GPS data that were used in this research was downloaded from the one of the several available outdoor enthusiasts’ web sites. EveryTrail (www.everytrail.com) allows its registered users to post and download GPS tracks that they have collected from their hike, bike, or other mode of transport. The GPS units used to collect the trail data are varied, however most are handheld units which typically record a location at a given time interval. For each trail or hike that a user wants to upload to the website, the user can upload a textual description of the trail, photos, and their GPS collected data in a GPX format. 104 P. Witt

GPS data were downloaded from the site in a manual manner. Each listed trail had a download link for the GPX file. Two criteria were used when searching for and downloading a hike. The first was total travel time. Hikes which were shorter than two hours or longer than 10 hours were not downloaded. The second was not to select any hike that had camping listed in the description of the hike. The reasoning behind these criteria was to not include data where heavy backpacks may impact travel speed thereby skewing the analysis with slower speeds, and also to limit the potential of including posted hikes that are actually run and not walked/hiked which would skew the analysis with higher speeds. A total of 52 hikes were initially downloaded and subsequently processed in order to prepare the data for analysis. Once the GPS data was imported into an Access table, a data review was performed. The intent of this review was to identify any potential problems with the dataset. Several hikes had missing or incomplete elevation or time data; these were removed from the study. The resulting data had several key characteristics. There were a total of 40 effective hikes, and a total of 51,538 individual GPS points. Figure 2 shows a geographical distribution of the GPS data used in the study.

Fig. 2: Distribution of GPS hike data

In addition, a Comparison GPS dataset was created from ten additional hikes that were downloaded from the EveryTrail web site. The data cleaning and processing of the ten hikes followed the same steps as the GPS dataset used in the study. One of the ten hikes had erroneous elevation data and the hike was deleted from the Comparison GPS dataset leaving nine hikes.

2.2 Dataset Creation and Distance Analysis

At this point the dataset was made up of individual points; in order to begin analyzing the data to meet the objectives of this research it was necessary to calculate the differences in time, distance, and elevation change between two collected GPS points. Table 1 presents the statistical summary of the time interval and distance between collected GPS points. The Development of a Predictive Hiking Time Travel Model 105

Table 1: GPS Dataset Statistics Statistics Time Distance N Valid 51538 51538 Missing 0 0 Mean 10.8496 10.7697 Std. Deviation 16.34258 12.57556 Variance 267.080 158.145 Range 884.00 449.00 Minimum 1.00 .09 Maximum 885.00 449.99 Percentiles 25 2.0000 2.9786 50 7.0000 7.4340 75 14.0000 14.8622

If the 25th percentiles are examined in Table 1, it becomes clear that a large part of the dataset GPS points have been collected with a small time interval of 2 seconds or less which reflects a distance between points of less than 3 meters. This is significant given the typical GPS accuracies for the hand held GPS units used in this study. The location accuracies that can be expected with these units are +/- 3 meters in the best of conditions and this dataset has more than a quarter of its data points within this distance and half of its data points are within less than 7.5 meters. To facilitate the calculations of the differences between GPS points and test/eliminate the effect of GPS accuracies on the analysis, an application was created that places the information from two GPS points into a single data record. This application was built using VB.Net and Windows Forms. Fig 3 shows the application interface that was developed with the intent to ascertain the optimum distance between GPS points for the study.

Fig. 3: Minimum Distance Application Interface 106 P. Witt

The data from two subsequent records are stored in the form controls. The application then builds a new data table where each new record is populated by the two GPS record values. For the first created data table, the application populates the new dataset with each subsequent record as described above. This dataset holds all of the GPS data points. As can be seen in Figure 3, there is a built in application option that allows for the user to enter a minimum distance between points that will populate a newly created data table. This was accomplished by loading the value of two sequential data points in the form and calculating the distance between them. If the calculated distance is less than the minimum distance specified by the user, the application then fetches the next GPS point and makes the distance comparison. The application will continue until the minimum distance is met. Once the minimum distance is achieved, the application will write the two respective GPS points to a single record in the created data table. Several datasets were built using this approach. The first as mentioned above was a dataset of sequential GPS points, the next was a dataset with 5 meter minimum distance between GPS points, 10 meters, 30 meters, 50 meters, 80 meters, and 100 meters respectively.

2.3 Flat terrain statistics

Variations in hiking speeds at near zero slopes were evaluated. The dataset was queried and slope values close to 0° <-0.03°, 0.03°> were selected. The same as above also was performed on the 5, 10, 30, 50, 80 and 100 minimum distance datasets that were created using the application developed in the study. Table 2 presents the results of the statistical analysis results for the datasets.

Table 2: Minimum Distance Datasets – Flat terrain speed (km/hr) statistics Statistics No 5 Meter 10 Meter 30 Meter 50 Meter 80 Meter 100 Meter Minimum Minimum Minimum Minimum Minimum Minimum Minimum Distance N (Valid) 7966 4672 3107 1442 885 588 482 (Missing) 0 0 0 0 0 0 0 Mean 4.5159 4.8186 4.9088 4.6734 4.5537 4.2265 4.1697 Std. 2.41158 2.40200 2.31279 2.24180 2.03839 1.79916 1.91168 Deviation Variance 5.816 5.770 5.336 5.026 4.155 3.237 3.655 Range 25.11 25.11 25.06 23.57 21.33 14.76 21.59 Minimum .28 .29 .33 .32 .55 .41 .29

One way envisioned to deal with the inherent GPS error in the dataset was to examine the speed variations at near zero slopes with datasets that have different minimum distances. The idea being that as we generalize the GPS dataset, the inherent GPS errors will contribute to a smaller portion of the calculated distance values and elevation differences between points. Therefore a reduced variability of speed values would be apparent. As can be seen in Table 3 this holds true for most cases. As the minimum distance increases the standard deviation of speed values is reduced, except for the 100 meter minimum distance The Development of a Predictive Hiking Time Travel Model 107 dataset. The standard deviation of the 100 meter dataset actually goes up slightly compared to the 80 meter dataset. This does not fit with the perceived impacts of spatially generalizing the GPS data, but could be due to specific data point artifacts in the dataset. So when looking at which dataset may be the best to choose for subsequent analysis, the 80 meter minimum distance dataset seems the optimal choice. However, one interesting aspect of the statistics presented in Table 2 is the change in the mean speed values of the different minimum distance datasets. As can be seen, the mean speed values increase until we reach the 10-meter minimum and then begin to decrease significantly. It is felt that there are two phenomena that explain this. The first which can be seen in the no minimum distance and the 5 meter dataset is the small distances in the datasets. Given the small distances in the subsequent data points, it is felt that the GPS inherent error has a significant impact on the calculated slope and distance values. The second is an issue of spatial scale. This can be seen in the minimum distance datasets from 10 meters to 100 meters. As we remove GPS points that define the hike, we are essentially removing vertices of the hike. Further explained, the application developed only calculates a Euclidean minimum distance between two points. As the routine moves from one GPS point below the minimum distance threshold to the next, in essence it deletes the point (vertex) that does not meet the minimum distance. The net effect of this is that as “vertices” of the hike are deleted the hike representation becomes generalized. So when the application calculates the distance value between the two points they will always be less than the actual traversed route. The reason this impacts the speed values is that the time difference between points still reflects the route traversed with all the original vertices. This spatial scale effect can be clearly seen in the decrease of the mean speed values as minimum distances increase from 30, 50, 80 and 100 meters. Given the factors mentioned above the 10 meter minimum distance dataset was chosen for continued study. This dataset was then refined by limiting slope and speed values. Specifically, slope values above 83% or below -83% were eliminated as these values defined maximum and minimum slopes that can be walked (SCHARENBROICH 2006). Upper speed values were limited to 25.6 km/hr as it represents the maximum human locomotion speed (MCKENZIE 2007). Minimum speed values below 0.28 km/hr were culled from the dataset as well. This lower limit was defined by taking 80% of the Tobler (TOBLER 1993) function at 83% slope (0.5 km/hr).

2.4 Regression

In order to get a view of the relationships between the slope and speed relationship, a scatter diagram was created of the GPS dataset (See Fig. 4). 108 P. Witt

Fig. 4: Slope/Speed Scatter Plot

Examining the plot, it seems clear that the slope speed relationship is not linear, but it is also difficult to see any form. Given this, approximating curves were tested for: linear, quadratic, cubic, compound, logistic, growth, and exponential equations. It should be noted that inverse, S-Curve, logarithmic and power curves could not be created due to zero slope values in the dataset. Using SPSS 13 curve fitting routines, slope speed regression models were examined. Four separate curve fitting routines were performed using slope as the independent variable and speed as the dependent variable. First the curve fitting routine was run using the complete dataset of GPS points; second, the absolute value of slope was used (complete dataset as well); third, a data set of zero and positive slope values (Positive Slope Data), and finally a dataset of negative slope values (Negative Slope Data). The curve estimation routine was run with several different regression model options as mentioned above. Table 3 presents the result of the curve fitting analysis.

Table 3: Curve Fitting Results Curve Fitting All Slope Absolute Slope Positive Slope Negative Slope Results Values Values Values Values Equation Cubic Exponential Exponential Exponential R2 0.0401 0.0806 0.1028 0.0673 F 368.40 2317.28 1640.87 874.01 Constant 4.620 4.628 4.478 4.851 b1 -1.867 -1.377 -1.577 1.232 b2 -6.859 b3 4.515

2.5 Travel model comparisons

The minimum distance application developed previously was used to create a 10 meter minimum distance Comparison GPS dataset. Comparisons of the Study model developed herein and Toblers model (TOBLER 1993) were plotted along with the Comparison GPS data (see Fig. 5). The Development of a Predictive Hiking Time Travel Model 109

Fig. 5: Scatterplot showing Comparison GPS data and slope speed models

As can be seen by Figure 5, both the Tobler model and the Study model seem to overestimate speeds at all slope values when plotted with the Comparison GPS data, especially at near zero slope. In addition Toblers model approximates the GPS better than the study model at higher positive and negative slopes. The regression models developed in this study were also used to develop speeds based on slope in the Comparison GPS data 10 meter minimum dataset. Two exponential equations were used. The Positive Slope Equation: y = 4.478 Exp(X * -1.577) was applied to all GPS records with slopes equal or greater than zero. The Negative Slope Equation: y = 4.851 EXP (X*1.233) was applied to all GPS records that had a slope less than zero.

Langmuir’s model (LANGMUIR 1984) is based on a constant speed of 5 km/hr with adjustments for uphill elevation gain (add 60 minutes for 600 meter gain) and downhill elevation loss. The downhill travel has corrections for slope ranges, with slopes between 9% – 21%, 20 minutes per 600 meters descent should be subtracted. For slopes greater than 21%, 20 minutes time should be added. Given Langmuir’s time adjustments, summary statistics were developed for each hike, for total elevation gain, elevation loss at slopes of 9% – 21%, and elevation loss at slopes greater than 21%. These elevation gains and losses were then reflected in time units and summed for each hike. The Tobler and the Study Speed models were also expressed in time units by multiplying the distance travelled for each segment by the speed of each respective model. Figure 6 presents the results of the models as expressed in time. 110 P. Witt

Fig. 6: Plot of Model Time Results (Seconds)

As can be seen in Figure 6, the three models in general show similar results. As presented in the figure, the observed time was similar to the model results. Hike 8 and 9; however, show a large difference in observed and modeled time. One characteristic for both of these hikes is that there was more relative downhill travel at slopes greater than 21% as compared to the other hikes. This could indicate that the models do not well reflect the travel speeds with steeper downhill slopes.

2.6 Exploratory Spatial Analysis

Given the first law of geography (TOBLER 1970), an exploratory spatial analysis was performed to examine inherent spatial relationships within the GPS dataset. ArcGIS 9.3 was used for all of the spatial analysis and results presentation. A single random hike was chosen, separated from the GPS dataset, and a Moran’s I spatial autocorrelation analysis performed. Measures of spatial autocorrelation were constructed by combining the value of the similarity of a variable with an indicator of spatial similarity. The results of the analysis show that the GPS data of the selected hike were spatially autocorrelated and that the chance of the spatial clustering being a random occurrence was less than 1%. In general terms the GPS points that were closer together had similar speed values. This means that the regression analysis performed will have bias in its results. Intuitively this seems reasonable, given that the speed values would reflect the topography traversed during the hike. The GPS points that are collected closer together would have similar slopes and other trail characteristics; this in turn would be reflected in similar speed values. Geographically Weighted Regression (GWR) was applied to the hike GPS data given that it does exhibit clustering. The GWR analysis showed that geographically weighted variable regression coefficients result in a better model fit as compared to the statistical regression model previously developed. The GWR overall R2 value of 0.5255 implies that slope accounts for 52.55% of the predicted speed values in the GWR model. There could be several reasons for the better model fit. One is that there is variation in the relationship between the slope and speed, and it is not stationary. Another is that GWR takes several The Development of a Predictive Hiking Time Travel Model 111 local samples and weights the speed values based on the distance between sample points. This in effect is a smoothing operation that would reduce outlier values. ESRI (2009) and others suggest that a spatial autocorrelation analysis should be performed on the residual values of the GWR. This in essence would ascertain if the errors in the GWR model are clustered or not. The idea being that if they are not clustered then the errors in the GWR are random and that the GWR model is correctly specified. If the GWR residuals are clustered, it suggests that the GWR was not specified correctly and that there are other processes that contribute to the models that are not accounted for. The results of the spatial autocorrelation analysis on the residuals of the GWR model on one hike as an example show that they are spatially dependant and that the chance of the spatial clustering being a random occurrence is less than 1%. This implies that other factors play an important role in slope speed relationships that have not been accounted for in the Study model.

3 Conclusions

One limitation in this study was that GPS accuracy information was not included in the GPX files of the hikes. Although the PDOP field and other accuracy data is available and tracked within the GPS units, most manufacturers do not facilitate the output this information. Another issue in this study is that there are no guidelines as to how the GPS data collection should be performed. This has the potential to have major impact to the speed calculations where stops and starts in any given hike is concerned. If GPS units are left on collecting points when the hiker is at rest, errors can be propagated into the speed calculations and hence the model development. Another reason for the spatial clustering could be related to the non-stationarity of the GPS data. In this study it was originally assumed that the slope speed relationship would be constant throughout. As seen with the analysis performed herein, that may not be the case. Hikers could start out faster and slow as they become more tired. They could also walk faster after taking breaks and begin to slow down. We could also have the “Horse to the Barn” effect, where hikers will move faster approaching a peak or the end of the hike. Finally, one or more factors that could possibly influence the slope speed relationship are not accounted for in the study. The first that comes to mind is the trail type or roughness; the second are the hikers themselves. Hiking speeds vary from person to person and from day to day. Hikers may also walk slower on hot days versus colder ones. Recommendations for changes to the study framework and future potential study points are listed below:  Many GPS units that are used by hikers and other outdoor enthusiasts can export GPS accuracy information into their export GPX file. It may be useful to suggest that whenever possible, these hikers export this information as well.  More robust data scrubbing techniques could be used in the development of speed models based on GPS collected data. These include error smoothing techniques, trip stops identification, and outlier removal methods. 112 P. Witt

 The application of GWR models results should be further examined. This would be especially significant it GPS accuracy data was available. PDOP or other accuracy measures could be used as a weighting variable.  It may be interesting to develop probabilistic models of travel speeds with the data based on hiker type and slope.

References

ESRI (2009), ArcGIS 9.3 DeskTop Help: Interpreting GWR Results. http://webhelp. esri.com/arcgisdesktop/9.3/index.cfm?TopicName=Interpreting_GWR_results (accessed August 22 2010). LANGMUIR, E. (1984), Mountaincraft and leadership. The Scottish Sports Council MLIB Cordee, Leicester. MCKENZIE, J. (2007), The use of GPS to predict energy expenditure for outdoor walking. PhD. Thesis, Department of Health and Human Development, Montana State University Bozeman. NAISMITH, W. W. (1892), Untitled. Scottish Mountaineering Club Journal, 2, 135. SCHARENBROICH, C. (2006), Classifying Access on Whitewater Wildlife Management Area Callahan Unit using GIS. Volume 8, Papers in Resource Analysis. 11 pp. Saint Mary’s University of Minnesota Central Services Press. Winona, MN. http://www.gis.smumn. edu/Pages/GraduateProjects.htm (accessed April 5 2010). TOBLER, W. (1970), A computer Movie Simulating Urban Growth in the Detroit Region. Economic Geography, 46 (2), 234-240. TOBLER, W. (1993), Non-isotropic geographic modeling (Technical Report No. 93-1). Santa Barbara, CA: National Center for Geographic Information and Analysis. WING, M. & EKLUND, A. (2006), Elevation Measurement Capabilities of Consumer-Grade Global Positioning System (GPS). Journal of Forestry, 3/2007. WING M., EKLUND, A. & KELLOG, L. (2005), Consumer-Grade Global Positioning System (GPS) accuracy and reliability. Journal of Forestry, 6/2005.