This paper distinguishes between routes that are drawn over maps and tracks that are recorded with GPS devices. Both routes and tracks are polygonal path representations of the real paths that we travel over the earth. Tracks are very handy for obtaining distance and elevation gain. They can be downloaded from sites where users share the tracks from their trips. They have the advantage over routes in being an actual record of the path of the device while the user was hiking a trail. Routes are still necessary when there is no previously-recorded track.
Routes have practical limitations on the number of route points that can be drawn. They are generally subject to bias from undersampling. Tracks are subject to undersampling error to a lesser extent than routes, tracks usually being more detailed with more track points. In addition, tracks are subject to satellite error on the order of 5 or 10 meters per track point. If the device records at too high a sample rate, the track will overestimate distance and cumulative elevation gain due to the accumulation of satellite errors.
In summary, there is no reason to believe that routes and tracks will give the same unbiased estimate of distance and elevation gain. The purpose of this research is to assess the magnitude by which route-based estimates differ from track-based estimates for a sample of data that has been used for planning or could be used for planning. By paired comparisons, we difference the length measured by a route with one measured by a track, the two both being representations of the same earth path. We find that length measurements from tracks average 0.65 miles longer than those from routes. This difference is about 12% of the average length of the routes in the data set. Using the same methodology, we find that elevation gain measurements do not differ signficantly between routes and tracks.
The R code in this document depends on the following packages:
library('knitr')
library('dplyr')
library('ggplot2')
library('lmtest')
library('sandwich')
For this investigation, routes were existing designated routes on gaiagps.com or were drawn using gaiagps.com tools. Each route is compared with a corresponding track that had been recorded and uploaded by a GaiaGps user. There are 11 “designated” routes, 13 “free-drawn” routes and 10 “assisted-drawn routes. Designated routes are routes already stored at gaiagps.com and available to users for information about established trips. For”free-drawn“” routes, I sketched cross-country routes over terrain using gaiagps.com drawing tools. For “assisted-drawn” routes, I drew the routes using the GaiaGps snap-to-trail feature. This data set is a convenience sample of past or planned backpacking trips in the Sierra Nevada and Arizona.
For each route, I measured the length in miles (route.length), cumulative elevation gain in feet (route.gain) and number of points (route.points). I measured the length, gain and number of points for the corresponding track (track.length, track.gain, track.points). See (Ahrens 2018) for definitions and formulas for these variables. The route and track attributes were measured using GpsPrune software (Workshop 2018).
The format and listing of the data set are given in the section “Route versus track data set details” below.
# load the data
rvst <- read.csv('route_vs_track_data.csv')
We analyze the routes and tracks as paired comparisons. The difference in length (length_diff) is track.length - route.length.
Virtually all tracks are longer than their corresponding routes. The length difference is larger and has more variation for designated type routes than for the other two type routes.
The simplest model treats each of the groups as independent random samples. We can test for a dependence of the length difference on route type. This uses a heterskedasticity-robust estimator of covariance and tests the route.type coefficients as a group.
lm_ld_routetype <- lm(length_diff ~ route.type, rvst_diff)
waldtest(lm_ld_routetype, length_diff ~ 1, vcov=vcovHC)
ct <- coeftest(lm_ld_routetype, vcov=vcovHC)
ct
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.64800 0.10584 6.1225 8.664e-07 ***
## route.typedesignated 0.34564 0.26068 1.3259 0.1946
## route.typefree_drawn 0.24354 0.22639 1.0757 0.2904
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The Wald test did not find sufficient evidence for a dependence of length difference on route type.
## 95% confidence interval for intercept:
## 0.4321392 0.8638608
## Intercept as percent of average route length:
## 12.41309
## Standard error of regression: 0.6311838
The intercept coefficient is significant. There is sufficient evidence of a positive difference between track lengths and route lengths. The point estimate of the difference is 0.65 miles with a confidence interval (95%) on the difference of (0.43, 0.86) miles. As a percentage of the average route length, the 0.65-mile difference is about 12%.
The standard error of this regression is 0.63 miles, almost as large as the assessed mean difference.
Controlling for route length and number of points could change results. Here is a general model that includes route.length, route.points and track.points as explanatory variables.
lm_ld <- lm(length_diff ~ route.type + route.length + route.points + track.points, rvst_diff)
coeftest(lm_ld, vcov=vcovHC)
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.17209101 0.26686817 0.6449 0.524271
## route.typedesignated -0.37319934 0.26967334 -1.3839 0.177324
## route.typefree_drawn 0.14981754 0.24615608 0.6086 0.547677
## route.length 0.04245285 0.06881196 0.6169 0.542259
## route.points -0.00078005 0.00027768 -2.8092 0.008954 **
## track.points 0.00061272 0.00019229 3.1865 0.003523 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
waldtest(lm_ld, . ~ . - route.points - track.points, vcov=vcovHC)
Now we find the difference between the length of a route and track can be explained by the difference in numbers of route points and track points. When these variables are introduced, the intercept is no longer significant. However, the variables route.points and track.points as a group have an F-test signficance of p < 0.001.
Holding other variables fixed, an increase of 100 in the number of points in the route results in a decrease in length difference of 0.08 miles. An increase of 100 in the number of points in the track results in an increase of length difference of 0.06 miles. Indeed, the similar magnitude and opposite signs of these coefficients suggest that track.points - route.points might be suitable predictor by itself.
We will try substituting track.points - route.points for track.points in the last model.
rvst_diff2 <- rvst_diff %>% mutate(delta.points=track.points - route.points)
lm_deltap <- lm(length_diff ~ route.type + route.length + route.points + delta.points, rvst_diff2)
coeftest(lm_deltap, vcov=vcovHC)
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.17209101 0.26686817 0.6449 0.524271
## route.typedesignated -0.37319934 0.26967334 -1.3839 0.177324
## route.typefree_drawn 0.14981754 0.24615608 0.6086 0.547677
## route.length 0.04245285 0.06881196 0.6169 0.542259
## route.points -0.00016733 0.00033899 -0.4936 0.625434
## delta.points 0.00061272 0.00019229 3.1865 0.003523 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
delta.points is the only significant term in this model. An increase of 100 points in the difference between numbers of track points and route points results in an increase in length difference of 0.06 miles.
A practical model for estimation leaves out all of the explanatory variables except delta.points.
lm_deltap2 <- lm(length_diff ~ delta.points, rvst_diff2)
summary(lm_deltap2)
##
## Call:
## lm(formula = length_diff ~ delta.points, data = rvst_diff2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.27927 -0.24177 -0.06035 0.29640 1.49575
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.3504808 0.1526144 2.297 0.028341 *
## delta.points 0.0005454 0.0001346 4.051 0.000304 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5184 on 32 degrees of freedom
## Multiple R-squared: 0.3389, Adjusted R-squared: 0.3183
## F-statistic: 16.41 on 1 and 32 DF, p-value: 0.0003038
coeftest(lm_deltap2, vcov=vcovHC)
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.35048076 0.13142401 2.6668 0.011910 *
## delta.points 0.00054539 0.00013773 3.9599 0.000392 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
An increase of 100 points in the difference between numbers of track points and route points results in an increase in length difference of 0.05 miles, and this slope is very significant (p < 0.0004).
This model suggests that we might make length corrections, based on number of points, with the idea of standardizing length measurements to some fixed density of track points. However, after controlling the difference in number of points, tracks are still average 0.35 miles longer than routes (p < 0.012). Moreover, the residual standard error of the model is still 0.52 miles.
Here are the residuals plots for the last model.
plot(lm_deltap2)
Referring to standarized residuals versus fitted values, variance of residuals may be smaller for small length differences than for others. The distribution of these residuals is consistent with the normal distribution.
We repeat the above analysis, but using track.gain - route.gain as the response variable.
Elevation gain difference is gain_diff = track.gain - route.gain.
rvst_gain <- rvst %>% mutate(gain_diff=track.gain - route.gain)
Here is a boxplot of the gain difference.
# box and whisker of differences.
ggplot(rvst_gain, aes(x=route.type, y=gain_diff)) +
geom_boxplot()
Elevation gain for tracks tends to be higher than for routes, but it is not uniformly so. The data exhibits variation that depends on route type.
This is the one-way ANOVA on the elevation gain response.
lm_gain_routetype <- lm(gain_diff ~ route.type, rvst_gain)
coeftest(lm_gain_routetype, vcov=vcovHC)
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.300 16.148 -0.0186 0.9853
## route.typedesignated 32.391 50.009 0.6477 0.5219
## route.typefree_drawn 26.838 58.190 0.4612 0.6479
We did not detect a difference in elevation gain between routes and tracks.
lm_gain <- lm(gain_diff ~ route.type + route.length + route.points + track.points, rvst_gain)
coeftest(lm_gain, vcov=vcovHC)
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -119.417321 79.978579 -1.4931 0.1466
## route.typedesignated -57.175043 66.617285 -0.8583 0.3980
## route.typefree_drawn 55.284721 89.295780 0.6191 0.5408
## route.length 36.997371 32.102799 1.1525 0.2589
## route.points -0.087173 0.083175 -1.0481 0.3036
## track.points -0.029150 0.108333 -0.2691 0.7898
We detect no intercept or trend in elevation gain difference when controlling for other explanatory variables.
Tracks are longer than their corresponding routes. The only external factor that we found that significantly explains the variation in lenght difference is the relative number of track and route points.
Tracks and routes did not show any net difference with regard to elevation gain.
This analysis does not settle which method has less bias relative to the length of the real earth path. The data set does not have a reference with which to compare measured length. Moving forward, we need to assume that both sources of measurement have some bias. The analysis does show that methods for estimating path length can differ by 0.65 miles or about 12% of the estimated length.
The rvst data set consists of 34 observations on 8 variables. Format is .csv.
Segment.name: A name assigned to this test unit based on the start and destination of a hike.
route.type: Short character that identifies if the route was “designated”, “free_drawn” or “assisted_drawn”. These types are defined below.
route.length: The sum of all route polygon edge lengths in miles.
route.gain: The sum of all route polygon edge positive increases in altitude in feet. An an edge that decreases in altitude is a zero gain.
route.points: Number of points in the route polygon.
track.length: The sum of all track polygon edge lengths in miles.
track.gain: The sum of all track polygon edge positive increases in altitude in feet. An an edge that decreases in altitude is a zero gain.
track.points: Number of points in the track polygon.
Designated routes are routes already stored at gaiagps.com and available to users for information about established trips. “Free-drawn”" routes are sketched cross-country routes over terrain using gaiagps.com drawing tools. “Assisted-drawn” routes are drawn using the GaiaGps snap-to-trail feature.
See (Ahrens 2018) for formulas for the numeric variables.
# load and look at the data
rvst <- read.csv('route_vs_track_data.csv')
#kable(rvst, caption = "Route versus Track Data Set")
rvst
summary(rvst %>% transmute(route.length=route.length, track.length=track.length))
## route.length track.length
## Min. : 1.990 Min. : 2.480
## 1st Qu.: 2.910 1st Qu.: 3.630
## Median : 3.975 Median : 5.275
## Mean : 5.220 Mean : 6.073
## 3rd Qu.: 7.530 3rd Qu.: 8.803
## Max. :10.800 Max. :13.000
Note that the mean length of a route is 5.22 miles, while the mean length of a track is 6.07 miles.
Ahrens, F. 2018. “Documentation of the Fredsempirical Formula for Estimating Travel Times from Route Distance.”
Workshop, Activity. 2018. “GpsPrune.” Available at http:// activityworkshop.net/software/gpsprune/ (2019/11/10).