Likelihood function for GPS movement data (normal inter-measurement time distributions)

Fred Ahrens

21 July 2018

This notebook defines a function for the log likelihood function of statistics for a device that is alternately moving and resting.

You supply a list of pairs (T, S), where T is the time interval between measurements and S is the measured distance between device locations. Your optimizing search function supplies a guess of the problem parameters (see below for definitions).

f_llmovement returns the log likelihood function (see below for definition).

It will also give its estimate of the motion state of the device at each interval, whether resting or moving. This implementation using truncated normal distributions for the distribution of the times between measurements.

class MovementData

This class will hold successive differences between track log points. It supplies methods for reading them in and functions to return indexed arrays for T and S.

In [2]:
import numpy as np

# Certain long-running cells are set by default not to run if the whole notebook is run.
# Set this flag to True to allow those cells to run.
autorun = False
In [3]:
class MovementData(object):
    def __init__(self):
        self.T = np.zeros((0,),float)
        self.S = np.zeros((0,),float)
        self.S2 = np.zeros((0,), float)
    
    # read T and S that are stored as columns in space-delimited data file
    def readTS(self,textfile):
        TS = np.loadtxt(textfile)
        self.T = TS[:,0]
        self.S = TS[:,1]
        self.S2 = np.square(self.S)
        
    # return the T and S2 arrays
    def getT(self):
        return self.T
    
    def getS2(self):
        return self.S2
In [4]:
import matplotlib.pyplot as plt

md = MovementData()
md.readTS('T_S.txt')

fig = plt.figure(1, figsize=(8,8))
ax1 = fig.add_subplot(311)
ax1.plot(md.T, md.S, linestyle='None', marker=u'D', color='dodgerblue')
ax1.grid()
ax1.set_xlabel('Time between measurements (s)')
ax1.set_ylabel('Distance between measurements (m)');

For this example, I am using some data from a Garmin eTrex taken during a trip around Mount Brewer in the Sierra Nevada. The eTrex sample interval varied between 1 and 42 seconds, but was typically every 15 seconds. We can see two modes in the scatter diagram. One is centered at 18 seconds and 15 meters between measurements. The other more widely spread around 30 seconds and 5 meters. The latter mode appears to be associated with the device in the rest state, since the low position differences can be explained by measurement error only. It is notable that the rest mode has longer time intervals than the movement mode, but this can be explained by the behavior of the device algorithm (see http://www.gpsinformation.org/dale/tracklog.htm .

Maximimum Likelihood optimization parameters

The following function pair converts between the vector x and optimization parameters that define x. The optimization parameters are defined as follows:

$\bar{v}$ (vbar) Mean speed of device when moving.
$\sigma_{v}^{2}$ (sigmaV2) Variance of device speed when moving.
$\sigma_{\varepsilon}^{2}$ (sigmaEp2) Device position measurement error variance, each dimension, m2.

$\mu_{0},\sigma_{0}^{2}$ (mu0, sigma02) Truncated normal distribution mean and variance parameters for time interval between measurements when device is at rest.

$\mu_{1},\sigma_{1}^{2}$ (mu1, sigma12) Truncated normal distribution mean and parameters for time interval between measurements when device is moving.

In [5]:
def LFx2params(x):
    vbar = x[0]
    sigmaV2 = x[1]
    sigmaEp2 = x[2]
    mu0 = x[3]
    sigma02 = x[4]
    mu1 = x[5]
    sigma12 = x[6]
    return (vbar, sigmaV2, sigmaEp2, mu0, sigma02, mu1, sigma12)
def LFparams2x(vbar, sigmaV2, sigmaEp2, mu0, sigma02, mu1, sigma12):
    x = np.array([vbar, sigmaV2, sigmaEp2, mu0, sigma02, mu1, sigma12])
    return x

Log likelihood function

This function calculates the log likelihood function given the observed intervals and the above seven model parameters.

$$L\left(\left\{ \left(T_{i},S_{i}^{2}\right)\right\} _{i=1}^{n}|\left\{ M_{i}\right\} _{i=1}^{n},\bar{v},\sigma_{v},\sigma_{\varepsilon}^{2}\right)$$

$$=\sum_{i=1}^{n}(1-M_{i})\left(-\log\left(1-\Phi\left(-\frac{\mu_{0}}{\sigma_{0}}\right)\right)-\frac{1}{2}\log\left(2\pi\sigma_{0}^{2}\right)-\frac{\left(T_{i}-\mu_{0}\right)^{2}}{2\sigma_{0}^{2}}-\log\left(2\sigma_{\varepsilon}^{2}\right)-\frac{S_{i}^{2}}{2\sigma_{\varepsilon}^{2}}\right)$$

$$+M_{i}\left(-\log\left(1-\Phi\left(-\frac{\mu_{1}}{\sigma_{1}}\right)\right)-\frac{1}{2}\log\left(2\pi\sigma_{1}^{2}\right)-\frac{\left(T_{i}-\mu_{1}\right)^{2}}{2\sigma_{1}^{2}}+\log\left(f_{M=1}\left(s^{2}|T_{i}\bar{v},T_{i}^{2}\sigma_{v}^{2},\sigma_{\varepsilon}^{2}\right)\right)\right)$$

A development of this model is in Ahrens, F. (2018) Likelihood Function for GPS Movement Data, Normal Inter-measurement Times.

The second argument, md, is a MovementData object which contains observations of time and distance between GPS measurements.

The function determines the movement state $M_{i}$ which maximizes each term of the likelihood function.

To get the values of $M_{i}$ as well as log likelihood function result, use the full return argument version:

llf, M = f_llmovement_full(x, md)

The function $f_{M=1}\left(s^{2}|T_{i}\bar{v},T_{i}^{2}\sigma_{v}^{2},\sigma_{\varepsilon}^{2}\right)$ is a convolution of a non-central and a central chi-square distribution. It is evaluated numerically by the function $\textsf{f_non_central_convolved}$ in the notebook $\textsf{noncentralX2convolution}$.

In [6]:
%%capture
% run noncentralX2convolution.ipynb
In [7]:
from scipy.stats import norm
from IPython.core.debugger import set_trace

def f_llmovement(x, md):
    llf, M = f_llmovement_full(x, md)
    return llf

def f_llmovement_full(x, md):
    vbar, sigmaV2, sigmaEp2, mu0, sigma02, mu1, sigma12 = LFx2params(x)
    sigma0 = np.sqrt(sigma02)
    sigma1 = np.sqrt(sigma12)
    T = md.getT()
    S2 = md.getS2()
    M = np.zeros(T.shape)
    llf = 0.
    for i in range(0, T.shape[0]):
        L0 = -np.log(1. - norm.cdf(-mu0/sigma0)) -1./2*np.log(2.*np.pi*sigma02) - 1./2*(T[i] - mu0)**2/sigma02 \
             - np.log(2*sigmaEp2) - S2[i]/2/sigmaEp2
        sigmaX2 = T[i]**2*sigmaV2 + sigmaEp2
        deltaX = (T[i]*vbar)**2/sigmaX2
        L1 = -np.log(1. - norm.cdf(-mu1/sigma1)) -1./2*np.log(2.*np.pi*sigma12) - 1./2*(T[i] - mu1)**2/sigma12 \
            + np.log(f_non_central_convolved(S2[i], 1., sigmaX2, deltaX, 1., sigmaEp2))
        if L1 > L0:
            M[i] = 1.
            llf += L1
        else:
            M[i] = 0.
            llf += L0
    return llf, M
In [8]:
# Test 0 of f_llmovement
# Updated to latest maximum likelihood solution.
vbar = 0.831017605
sigmaV2 = 0.121499414
sigmaEp2 = 27.5064636
mu0 = 30.9988448
sigma02 = 11.6745767
mu1 = 17.1095423
sigma12 =  11.6745767
llf, M = f_llmovement_full(LFparams2x(vbar, sigmaV2, sigmaEp2, mu0, sigma02, mu1, sigma12), md)
# separately plot the moving and non-moving points
t_move = md.getT()[np.equal(M,1.)]
s_move = md.S[np.equal(M,1.)]
t_rest = md.getT()[np.equal(M,0.)]
s_rest = md.S[np.equal(M,0.)]
fig = plt.figure(1, figsize=(8,8))
ax1 = fig.add_subplot(311)
ax1.plot(t_move, s_move, linestyle='None', marker=u'D', color='dodgerblue', label="Moving");
ax1.plot(t_rest, s_rest, linestyle='None', marker=u'D', color='coral', label="Stationary");
ax1.legend()
ax1.grid()
ax1.set_xlabel('Time between measurements (s)');
ax1.set_ylabel('Distance between measurements (m)');

Left-Tail Probability Constraint

This constraint on the inter-measurement time normal distribution parameters ensures that the stationary device inter-measurement time distribution lies to the right of the moving inter-measurement time distribution. $$\mu_{0}\geq\mu_{1}\textrm{and}\sigma_{0}^{2}\leq\sigma_{1}^{2}$$ This is implemented as two constraint functions in the next cell.

In [8]:
def f_const_time_mu(x):
    vbar, sigmaV2, sigmaEp2, mu0, sigma02, mu1, sigma12 = LFx2params(x)
    return mu0 - mu1

def f_const_time_sigma(x):
    vbar, sigmaV2, sigmaEp2, mu0, sigma02, mu1, sigma12 = LFx2params(x)
    return sigma12 - sigma02

Maximum Likelihood Estimation

This cell calls scipy.optimize.minimize passing the f_llmovement function.

In [9]:
if autorun:
    from scipy.optimize import minimize
    from IPython.core.interactiveshell import InteractiveShell

    InteractiveShell.ast_node_interactivity = "all"

    # callback function for minimize
    def cb_llmovement(x):
        llf = f_llmovement(x,md)
        print(x)
        print llf

    # solution obtained from last run after a long time
    vbar = 0.831017605
    sigmaV2 = 0.121499414
    sigmaEp2 = 27.5064636
    mu0 = 30.9988448
    sigma02 = 11.6745767
    mu1 = 17.1095423
    sigma12 =  11.6745767
    x0 =  LFparams2x(vbar, sigmaV2, sigmaEp2, mu0, sigma02, mu1, sigma12)
    bnds = [(0., None)]*7
    constr = [{'type':'ineq', 'fun':f_const_time_mu}, {'type':'ineq', 'fun':f_const_time_sigma}]
    res = minimize(lambda x: -f_llmovement(x, md), x0, method='SLSQP', bounds=bnds, constraints=constr, \
               callback=cb_llmovement)

A local optimum test using Latin hypercube random sampling

We sample in a neighborhood of the last best solution to verify that it is a local maximum.

In [11]:
autorun = True
if autorun:
    from pyDOE import lhs
    eta = 0.2
    vbar_bnds = (vbar/(1.+eta), vbar*(1.+eta))
    sigmaV2_bnds = (0., sigmaV2*(1.+eta))
    sigmaEp2_bnds = (sigmaEp2/(1.+eta), sigmaEp2*(1.+eta))
    mu0_bnds = (mu0/(1.+eta), mu0*(1.+eta))
    sigma02_bnds = (sigma02/(1.+eta), sigma02*(1.+eta))
    mu1_bnds = (mu1/(1.+eta), mu1*(1.+eta))
    sigma12_bnds = (sigma12/(1.+eta), sigma12*(1.+eta))
    x_bnds = [vbar_bnds, sigmaV2_bnds, sigmaEp2_bnds, mu0_bnds, sigma02_bnds, mu1_bnds, sigma12_bnds]
    u = lhs(7, 70, 'center')
    X = np.zeros(u.shape)
    LLF = np.zeros((u.shape[0],))
    for i in range(0, u.shape[0]):
        for j in range(0, u.shape[1]):
            X[i,j] = x_bnds[j][0] + (x_bnds[j][1]-x_bnds[j][0])*u[i,j]
        LLF[i] = f_llmovement(X[i,:],md)
autorun = False
In [12]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "last"

# Some plots of the local optimum Latin hypercube data
fig = plt.figure(1, figsize=(8,8))
ax0 = fig.add_subplot(421)
ax0.plot(X[:,0], LLF, linestyle='None', marker=u'D', color='dodgerblue');
ax0.plot(vbar, -5007.4, linestyle='None', marker=u'D', color='coral');
ax0.grid()
ax0.set_xlabel('Average speed (m/sec)');
ax0.set_ylabel('Log likelihood');
ax1 = fig.add_subplot(422)
ax1.plot(X[:,1], LLF, linestyle='None', marker=u'D', color='dodgerblue');
ax1.plot(sigmaV2, -5007.4, linestyle='None', marker=u'D', color='coral');
ax1.grid()
ax1.set_xlabel('Variance of speed (m^2/sec^2)');
ax1.set_ylabel('Log likelihood');
ax2 = fig.add_subplot(423)
ax2.plot(X[:,2], LLF, linestyle='None', marker=u'D', color='dodgerblue');
ax2.plot(sigmaEp2, -5007.4, linestyle='None', marker=u'D', color='coral');
ax2.grid()
ax2.set_xlabel('Variance of position error (m^2)');
ax2.set_ylabel('Log likelihood');
ax3 = fig.add_subplot(424)
ax3.plot(X[:,3], LLF, linestyle='None', marker=u'D', color='dodgerblue');
ax3.plot(mu0, -5007.4, linestyle='None', marker=u'D', color='coral');
ax3.grid()
ax3.set_xlabel('Mean time between measurements, stationary (sec)');
ax3.set_ylabel('Log likelihood');
ax4 = fig.add_subplot(425)
ax4.plot(X[:,4], LLF, linestyle='None', marker=u'D', color='dodgerblue');
ax4.plot(sigma02, -5007.4, linestyle='None', marker=u'D', color='coral');
ax4.grid()
ax4.set_xlabel('Variance of time between measurements, stationary (sec)');
ax4.set_ylabel('Log likelihood');
ax5 = fig.add_subplot(426)
ax5.plot(X[:,5], LLF, linestyle='None', marker=u'D', color='dodgerblue');
ax5.plot(mu1, -5007.4, linestyle='None', marker=u'D', color='coral');
ax5.grid()
ax5.set_xlabel('Mean time between measurements, moving (sec)');
ax5.set_ylabel('Log likelihood');
ax6 = fig.add_subplot(427)
ax6.plot(X[:,6], LLF, linestyle='None', marker=u'D', color='dodgerblue');
ax6.plot(sigma12, -5007.4, linestyle='None', marker=u'D', color='coral');
ax6.grid()
ax6.set_xlabel('Variance of time between measurements, stationary (sec)');
ax6.set_ylabel('Log likelihood');

fig.tight_layout()
plt.show()
In [14]:
# plots of the final distributions of Normal(mu0, sigma02) and gamma(mu1, sigma12)
t = np.arange(1., 42., 1.)
g0 = np.zeros(t.shape)
g1 = np.zeros(t.shape)
for i in range(0, t.shape[0]):
    g0[i] = 1./np.sqrt(2.*np.pi*sigma02)*np.exp(-0.5*(t[i] - mu0)**2/sigma02)
    g1[i] = 1./np.sqrt(2.*np.pi*sigma12)*np.exp(-0.5*(t[i] - mu1)**2/sigma12)
    
fig = plt.figure(1, figsize=(8,8))
ax1 = fig.add_subplot(211)
ax1.plot(t, g0, color='dodgerblue', label='Stationary');
ax1.set_xlabel('Time interval (sec)');
ax1.set_ylabel('pdf');
ax1.plot(t, g1, color='coral', label='Moving');
ax1.grid()
ax1.legend();

plt.tight_layout()
plt.show()
    

The estimated time between measurements for moving devices is about what I had guessed. The mean time between measurements for stationary devices is as expected. The left-tail constraint is forcing the standard deviations of the two distributions to be about the same.

Plot an x-y track marking the rest points found by the algorithm

In [17]:
lons_lats = MovementData()
lons_lats.readTS('lons_lats.txt')
lons = lons_lats.getT()
lats = lons_lats.S
# Earth radius a latitude 36.7964
Re = 6370506
x = np.deg2rad(lons - lons[0])*Re*np.cos(np.deg2rad(lats[0]))
y = np.deg2rad(lats - lats[0])*Re
llf, M = f_llmovement_full(LFparams2x(vbar, sigmaV2, sigmaEp2, mu0, sigma02, mu1, sigma12), md)
M1 = np.equal(np.concatenate((np.array([0.]),M)),0.)

fig = plt.figure(1, figsize=(8,8))
ax1 = fig.add_subplot(211)
ax1.plot(x, y, color='dodgerblue', label='Moving');
ax1.plot(x[M1], y[M1], linestyle='None', marker=u'D', color='coral', label='Resting');
ax1.grid()
ax1.axis('equal')
ax1.set_xlabel('Easting (m)');
ax1.set_ylabel('Northing (m)');
#ax1.set_xlim(3000., 3300.)
#ax1.set_ylim(-1500., -1000.)

ax1.legend();

plt.tight_layout()
plt.show()

The classifications look about right according to my memory. The longest stops were at the trailhead (vic (100, -200)), at the bridge crossing of the South Fork (vic (2800, -800)), and close to the Sphinx Trail junction (vic (3800, -1800)).

In [1]:
!jupyter nbconvert llfGPSmovementData2.ipynb --to html
[NbConvertApp] Converting notebook llfGPSmovementData2.ipynb to html
[NbConvertApp] Writing 455872 bytes to llfGPSmovementData2.html