Documentation of the FredsEmpirical formula for estimating travel times from route distance

Fred Ahrens

23 September 2018

The FredsEmpirical formula is a linear function of route distance, cumulative elevation gain and terrain type that estimates the backpacking travel time. This document provides the data sources and statistical regression model for the FredsEmpirical formula.

Nomenclature

$\boldsymbol{x}_{i},i=0,\ldots,n$ Successive positions listed in test route.

$d\left(\boldsymbol{x},\boldsymbol{y}\right)$ A function that returns the great circle distance between two positions.

$C_{i},i=1,\ldots,n$ A list of binary variables, $C_{i}=1$ if the corresponding segment is over cross country, and $C_{i}=0$ is on trail.

$S=\sum_{i=1}^{n}d\left(\boldsymbol{x}_{i},\boldsymbol{x}_{i-1}\right)\left(1-C_{i}\right)$ Total estimated trail distance of test route.

$R=\sum_{i=1}^{n}d\left(\boldsymbol{x}_{i},\boldsymbol{x}_{i-1}\right)C_{i}$ Total estimated cross country distance of test route.

$z_{i},i=0,\ldots,n$ Elevation estimates for each position in test route.

$Z=\sum_{i=1}^{n}\max\left(0,z_{i}-z_{i-1}\right)$ Cumulative elevation gain for test route.

$\tau_{0},\tau_{f}$ Time stamps of the start and finish times for actual backpack of test route.

$T=\tau_{f}-\tau_{0}$ Meaured travel time for test route.

Route Travel Time Data

I collected travel time, trail distance, cross country distance and elevation gain for 31 days of backpacking spread over four backpacking trips. In this paper, there is a lengthy discussion on what makes these variables good predictors of travel time.

Here, I just plot the data. A table of the data appears at the end of this article.

In [5]:
# plot the data as scatter diagrams, total hours vs miles on trail, miles on CC, elevation gain
import matplotlib.pyplot as plt
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "last"

miles_trail = getMilesTrail()
miles_CC = getMilesCC()
elevation_gain_feet = getElevGainFeet()
total_hours = getTotalHours()

fig = plt.figure(1, figsize=(8,8))
ax0 = fig.add_subplot(211)
ax0.plot(miles_trail, total_hours, linestyle='None', marker=u'D', color='dodgerblue', label='Trail');
ax0.plot(miles_CC, total_hours, linestyle='None', marker=u'D', color='coral', label='Cross country');
ax0.legend()
ax0.grid()
ax0.set_xlabel('Route miles');
ax0.set_ylabel('Total hours');
ax0 = fig.add_subplot(212)
ax0.plot(elevation_gain_feet, total_hours, linestyle='None', marker=u'D', color='dodgerblue');
ax0.grid()
ax0.set_xlabel('Elevation gain (feet)');
ax0.set_ylabel('Total hours');

Scatter plots of travel time data

Here are scatter plots of total travel hours ($T$) versus each of the three variables route miles on trail ($S$), route miles cross country ($R$) and elevation gain ($Z$). We can see a relationship between all three variables and total travel time. We also see that cross country miles has a stronger effect on travel time than trail miles.

Statistical model of travel time

The statistical model for the route travel time is

$$T=\beta_{S}S+\beta_{R}R+\beta_{Z}Z+\varepsilon$$,

where $T$ is the travel time in hours, $S$ is the trail distance , $R$ is the cross country distance, $Z$ is the cumulative elevation of the route, $\varepsilon$ is the prediction error and sum of all uncertain effects, and $\beta_{S},\beta_{R},\beta_{Z}$ are the unknown regression coefficients.

$$S=\sum_{i=1}^{n}d\left(\boldsymbol{x}_{i},\boldsymbol{x}_{i-1}\right)\left(1-C_{i}\right)$$

$$R=\sum_{i=1}^{n}d\left(\boldsymbol{x}_{i},\boldsymbol{x}_{i-1}\right)C_{i}$$

$$Z=\sum_{i=1}^{n}\max\left(0,z_{i}-z_{i-1}\right)$$.

A multiple linear least squares regression gives us estimates of the coefficients $\beta_{S},\beta_{R},\beta_{Z}$. They appear in the ouput of the next cell with the labels miles_trail, miles_CC and elevation_gain_feet.

In [25]:
# multiple linear least squares regression of the backpacking data
import pandas as pd
from pandas.core import datetools
import statsmodels.api as sm
df_X = pd.DataFrame({'miles_trail': getMilesTrail(), 'miles_CC':getMilesCC(), 
                     'elevation_gain_feet': getElevGainFeet()})
df_Y = pd.DataFrame({'total_hours': getTotalHours()})
model = sm.OLS(df_Y, df_X).fit()
model.params
Out[25]:
elevation_gain_feet    0.001156
miles_CC               1.334724
miles_trail            0.379191
dtype: float64

Estimated coefficients of the model

$\hat{\beta}_{S} = 0.379191$
$\hat{\beta}_{R} = 1.334724$
$\hat{\beta}_{Z} = 0.001156$
$\hat{\beta}_{S}$ is equivalent to about 2.6 miles per hour. Cross-country travel is much slower on average than on trail, equivalent to about 0.75 miles per hour. The slower speed reflects the time consumed in route finding and in negotiating rough terrain. $\hat{\beta}_{Z}$ is equivalent to 1 hour for every 865 feet of climbing.

Prediction Error

This model has a high level of unexplained variance due to other hidden factors. Cross-country travel, in particular, has a high degree of uncertainty. This section will estimate the prediction error of this model.

In [33]:
# root mean squared error of residuals
np.sqrt(model.mse_resid)
Out[33]:
1.0512823291522229
In [42]:
# plot the residuals versus total hours
fig = plt.figure(1, figsize=(8,8))
ax0 = fig.add_subplot(211)
ax0.plot(df_Y['total_hours'], np.abs(model.resid), linestyle='None', marker=u'D', color='dodgerblue');
ax0.grid()
ax0.set_xlabel('Total hours');
ax0.set_ylabel('Model prediction error (hours)');
# plot residual as percentage of total hours
ax1 = fig.add_subplot(212)
pctError = np.divide(np.abs(model.resid), df_Y['total_hours'])*100
ax1.plot(df_Y['total_hours'], pctError, linestyle='None', marker=u'D', color='dodgerblue');
ax1.grid()
ax1.set_xlabel('Total hours');
ax1.set_ylabel('Model prediction error (%)');
plt.show()
# calculate percentage error for just those observations greater than 2 hours
isGt2 = np.greater(df_Y['total_hours'], 2.0)
print('Average prediction error (%):', np.mean(pctError[isGt2]))
('Average prediction error (%):', 17.286621394127128)

Prediction error is heteroscedastic

The model error grows with the length of the hike. A reasonably good margin for uncertainty is 17 to 34 percent of the total estimated travel time.

Tabulated data set

This is the backpacking data in tabulated form.

In [47]:
df = pd.DataFrame({'trip': getDataColumn(0), 'route_segment': getDataColumn(1), 'miles_trail': getMilesTrail(), 
              'miles_CC':getMilesCC(), 'elevation_gain_feet': getElevGainFeet(), 'total_hours': getTotalHours()})
df[['trip','route_segment','miles_trail','miles_CC','elevation_gain_feet','total_hours']]
Out[47]:
trip route_segment miles_trail miles_CC elevation_gain_feet total_hours
0 mammoth crest 0 3.26 0.00 1437 2.28
1 mammoth crest 1 9.21 0.00 2776 7.13
2 mammoth crest 2 0.00 3.86 1863 7.65
3 mammoth crest 3 0.00 2.74 1611 6.93
4 mammoth crest 4 0.00 2.50 1393 6.82
5 mammoth crest 5 4.10 5.71 2619 11.08
6 mammoth crest 6 2.49 1.47 2222 8.26
7 mammoth crest 7 8.59 1.39 1969 5.88
8 brewer loop 0 4.41 0.00 1884 2.64
9 brewer loop 1 3.47 0.00 2722 4.01
10 brewer loop 2 0.00 2.91 2180 5.40
11 brewer loop 3 0.00 1.00 495 1.01
12 brewer loop 4 0.00 2.21 1225 4.30
13 brewer loop 5 0.00 2.57 1431 4.38
14 brewer loop 6 0.00 2.29 1049 3.69
15 brewer loop 7 0.00 0.66 104 0.65
16 brewer loop 8 1.35 0.92 471 2.70
17 brewer loop 9 2.72 0.00 219 1.41
18 brewer loop 10 10.54 0.00 915 5.08
19 abbot loop 0 3.70 0.00 2747 3.63
20 abbot loop 1 2.09 2.26 2238 6.45
21 abbot loop 2 0.00 3.88 1401 6.02
22 abbot loop 3 0.00 3.75 1228 5.71
23 abbot loop 4 7.63 2.01 2197 8.73
24 abbot loop 5 6.06 0.00 2406 6.24
25 abbot loop 6 0.00 0.97 311 0.57
26 abbot loop 7 1.91 0.00 470 2.17
27 abbot loop 8 6.46 0.00 303 3.46
28 rock island lake loop 3 0.00 2.30 961 5.20
29 rock island lake loop 4 0.72 1.53 377 3.00
30 rock island lake loop 5 0.00 3.35 412 6.20