Documentation of the FredsEmpirical formula for estimating travel times from route distance¶

Fred Ahrens¶

23 September 2018¶

The FredsEmpirical formula is a linear function of route distance, cumulative elevation gain and terrain type that estimates the backpacking travel time. This document provides the data sources and statistical regression model for the FredsEmpirical formula.

Nomenclature¶

$\boldsymbol{x}_{i},i=0,\ldots,n$ Successive positions listed in test route.

$d\left(\boldsymbol{x},\boldsymbol{y}\right)$ A function that returns the great circle distance between two positions.

$C_{i},i=1,\ldots,n$ A list of binary variables, $C_{i}=1$ if the corresponding segment is over cross country, and $C_{i}=0$ is on trail.

$S=\sum_{i=1}^{n}d\left(\boldsymbol{x}_{i},\boldsymbol{x}_{i-1}\right)\left(1-C_{i}\right)$ Total estimated trail distance of test route.

$R=\sum_{i=1}^{n}d\left(\boldsymbol{x}_{i},\boldsymbol{x}_{i-1}\right)C_{i}$ Total estimated cross country distance of test route.

$z_{i},i=0,\ldots,n$ Elevation estimates for each position in test route.

$Z=\sum_{i=1}^{n}\max\left(0,z_{i}-z_{i-1}\right)$ Cumulative elevation gain for test route.

$\tau_{0},\tau_{f}$ Time stamps of the start and finish times for actual backpack of test route.

$T=\tau_{f}-\tau_{0}$ Meaured travel time for test route.

Route Travel Time Data¶

I collected travel time, trail distance, cross country distance and elevation gain for 31 days of backpacking spread over four backpacking trips. In this paper, there is a lengthy discussion on what makes these variables good predictors of travel time.

Here, I just plot the data. A table of the data appears at the end of this article.

# plot the data as scatter diagrams, total hours vs miles on trail, miles on CC, elevation gain
import matplotlib.pyplot as plt
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "last"

miles_trail = getMilesTrail()
miles_CC = getMilesCC()
elevation_gain_feet = getElevGainFeet()
total_hours = getTotalHours()

fig = plt.figure(1, figsize=(8,8))
ax0 = fig.add_subplot(211)
ax0.plot(miles_trail, total_hours, linestyle='None', marker=u'D', color='dodgerblue', label='Trail');
ax0.plot(miles_CC, total_hours, linestyle='None', marker=u'D', color='coral', label='Cross country');
ax0.legend()
ax0.grid()
ax0.set_xlabel('Route miles');
ax0.set_ylabel('Total hours');
ax0 = fig.add_subplot(212)
ax0.plot(elevation_gain_feet, total_hours, linestyle='None', marker=u'D', color='dodgerblue');
ax0.grid()
ax0.set_xlabel('Elevation gain (feet)');
ax0.set_ylabel('Total hours');

Scatter plots of travel time data¶

Here are scatter plots of total travel hours ($T$) versus each of the three variables route miles on trail ($S$), route miles cross country ($R$) and elevation gain ($Z$). We can see a relationship between all three variables and total travel time. We also see that cross country miles has a stronger effect on travel time than trail miles.

Statistical model of travel time¶

The statistical model for the route travel time is

$$T=\beta_{S}S+\beta_{R}R+\beta_{Z}Z+\varepsilon$$,

where $T$ is the travel time in hours, $S$ is the trail distance , $R$ is the cross country distance, $Z$ is the cumulative elevation of the route, $\varepsilon$ is the prediction error and sum of all uncertain effects, and $\beta_{S},\beta_{R},\beta_{Z}$ are the unknown regression coefficients.

$$S=\sum_{i=1}^{n}d\left(\boldsymbol{x}_{i},\boldsymbol{x}_{i-1}\right)\left(1-C_{i}\right)$$

$$R=\sum_{i=1}^{n}d\left(\boldsymbol{x}_{i},\boldsymbol{x}_{i-1}\right)C_{i}$$

$$Z=\sum_{i=1}^{n}\max\left(0,z_{i}-z_{i-1}\right)$$.

A multiple linear least squares regression gives us estimates of the coefficients $\beta_{S},\beta_{R},\beta_{Z}$. They appear in the ouput of the next cell with the labels miles_trail, miles_CC and elevation_gain_feet.

# multiple linear least squares regression of the backpacking data
import pandas as pd
from pandas.core import datetools
import statsmodels.api as sm
df_X = pd.DataFrame({'miles_trail': getMilesTrail(), 'miles_CC':getMilesCC(), 
                     'elevation_gain_feet': getElevGainFeet()})
df_Y = pd.DataFrame({'total_hours': getTotalHours()})
model = sm.OLS(df_Y, df_X).fit()
model.params

elevation_gain_feet    0.001156
miles_CC               1.334724
miles_trail            0.379191
dtype: float64

Estimated coefficients of the model¶

$\hat{\beta}_{S} = 0.379191$
$\hat{\beta}_{R} = 1.334724$
$\hat{\beta}_{Z} = 0.001156$
$\hat{\beta}_{S}$ is equivalent to about 2.6 miles per hour. Cross-country travel is much slower on average than on trail, equivalent to about 0.75 miles per hour. The slower speed reflects the time consumed in route finding and in negotiating rough terrain. $\hat{\beta}_{Z}$ is equivalent to 1 hour for every 865 feet of climbing.

Prediction Error¶

This model has a high level of unexplained variance due to other hidden factors. Cross-country travel, in particular, has a high degree of uncertainty. This section will estimate the prediction error of this model.

# root mean squared error of residuals
np.sqrt(model.mse_resid)

1.0512823291522229

# plot the residuals versus total hours
fig = plt.figure(1, figsize=(8,8))
ax0 = fig.add_subplot(211)
ax0.plot(df_Y['total_hours'], np.abs(model.resid), linestyle='None', marker=u'D', color='dodgerblue');
ax0.grid()
ax0.set_xlabel('Total hours');
ax0.set_ylabel('Model prediction error (hours)');
# plot residual as percentage of total hours
ax1 = fig.add_subplot(212)
pctError = np.divide(np.abs(model.resid), df_Y['total_hours'])*100
ax1.plot(df_Y['total_hours'], pctError, linestyle='None', marker=u'D', color='dodgerblue');
ax1.grid()
ax1.set_xlabel('Total hours');
ax1.set_ylabel('Model prediction error (%)');
plt.show()
# calculate percentage error for just those observations greater than 2 hours
isGt2 = np.greater(df_Y['total_hours'], 2.0)
print('Average prediction error (%):', np.mean(pctError[isGt2]))

('Average prediction error (%):', 17.286621394127128)

Prediction error is heteroscedastic¶

The model error grows with the length of the hike. A reasonably good margin for uncertainty is 17 to 34 percent of the total estimated travel time.

Tabulated data set¶

This is the backpacking data in tabulated form.

df = pd.DataFrame({'trip': getDataColumn(0), 'route_segment': getDataColumn(1), 'miles_trail': getMilesTrail(), 
              'miles_CC':getMilesCC(), 'elevation_gain_feet': getElevGainFeet(), 'total_hours': getTotalHours()})
df[['trip','route_segment','miles_trail','miles_CC','elevation_gain_feet','total_hours']]

	trip	route_segment	miles_trail	miles_CC	elevation_gain_feet	total_hours
0	mammoth crest	0	3.26	0.00	1437	2.28
1	mammoth crest	1	9.21	0.00	2776	7.13
2	mammoth crest	2	0.00	3.86	1863	7.65
3	mammoth crest	3	0.00	2.74	1611	6.93
4	mammoth crest	4	0.00	2.50	1393	6.82
5	mammoth crest	5	4.10	5.71	2619	11.08
6	mammoth crest	6	2.49	1.47	2222	8.26
7	mammoth crest	7	8.59	1.39	1969	5.88
8	brewer loop	0	4.41	0.00	1884	2.64
9	brewer loop	1	3.47	0.00	2722	4.01
10	brewer loop	2	0.00	2.91	2180	5.40
11	brewer loop	3	0.00	1.00	495	1.01
12	brewer loop	4	0.00	2.21	1225	4.30
13	brewer loop	5	0.00	2.57	1431	4.38
14	brewer loop	6	0.00	2.29	1049	3.69
15	brewer loop	7	0.00	0.66	104	0.65
16	brewer loop	8	1.35	0.92	471	2.70
17	brewer loop	9	2.72	0.00	219	1.41
18	brewer loop	10	10.54	0.00	915	5.08
19	abbot loop	0	3.70	0.00	2747	3.63
20	abbot loop	1	2.09	2.26	2238	6.45
21	abbot loop	2	0.00	3.88	1401	6.02
22	abbot loop	3	0.00	3.75	1228	5.71
23	abbot loop	4	7.63	2.01	2197	8.73
24	abbot loop	5	6.06	0.00	2406	6.24
25	abbot loop	6	0.00	0.97	311	0.57
26	abbot loop	7	1.91	0.00	470	2.17
27	abbot loop	8	6.46	0.00	303	3.46
28	rock island lake loop	3	0.00	2.30	961	5.20
29	rock island lake loop	4	0.72	1.53	377	3.00
30	rock island lake loop	5	0.00	3.35	412	6.20