The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
The gmwmx2
R
package allows to estimate
linear model with correlated residuals in presence of missing data.
More precisely, we assume the following model: \[\begin{equation} \boldsymbol{Y} = \boldsymbol{X} \boldsymbol{\beta} + \boldsymbol{\varepsilon}, \quad \boldsymbol{\varepsilon} \sim \mathcal{F}\left\{\boldsymbol{\Sigma}\left(\boldsymbol{\gamma}\right)\right\}, \end{equation}\]
where \(\boldsymbol{X} \in \mathbb{R}^{n \times p}\) is a design matrix of observed predictors, \(\boldsymbol{\beta} \in \mathbb{R}^p\) is the regression parameter vector and \(\boldsymbol{\varepsilon} = \{\varepsilon_{i}\}_{i=1,\ldots,n}\) is a zero-mean process following an unspecified joint distribution \(\mathcal{F}\) with positive-definite covariance function \(\boldsymbol{\Sigma}(\boldsymbol{\gamma}) \in \mathbb{R}^{n\times n}\) characterizing the second-order dependence structure of the process and parameterized by the vector \(\boldsymbol{\gamma} \in \boldsymbol{\Gamma} \subset \mathbb{R}^q\).
We then define the a random variable \(\boldsymbol{Z} =\{Z_{i}\}_{i=1,\ldots,n}\) which describes the missing observation mechanism. More specifically, the vector \(\boldsymbol{Z} \in \{0, 1\}^n\) is a binary-valued stationary process independent of \(\boldsymbol{Y}\) with expectation \(\mu(\boldsymbol{\vartheta}) = \mathbb{E}[Z_i] \in [0, \, 1)\), \(\forall \, i\), and covariance matrix \(\boldsymbol{\Lambda}(\boldsymbol{\vartheta}) = \mathbb{V}[\boldsymbol{Z}] \in \mathbb{R}^{n\times n}\) whose structure is assumed known up to the parameter vector \(\boldsymbol{\vartheta} \in \boldsymbol{\Upsilon} \subset \mathbb{R}^k\)
We then assume that we only observe the stochastic process \(\tilde{\boldsymbol{Y}} = \boldsymbol{Z} \odot \boldsymbol{Y}\), where \(\odot\) denotes the Hadamard product. Hence, \(\tilde{\boldsymbol{Y}} \in \mathbb{R}^n = \boldsymbol{Y} \odot \boldsymbol{Z}\) represents the observed process vector with null elements in the positions where observations are missing.
Using \(\otimes\) to denote the Kronecker product, we then define \(\tilde{\boldsymbol{X}} = \boldsymbol{Z} \otimes \boldsymbol{1}^T \odot \boldsymbol{X} \in \mathbb{R}^{n \times p}\) as the design matrix \(\boldsymbol{X}\) with zero-valued vectors for the rows where observations are missing in \(\boldsymbol{Y}\) (where \(\boldsymbol{1}\) represents a vector of ones of dimension \(p\)).
We estimate the parameters \(\boldsymbol{\beta}\) with the least square estimator:
\[\begin{equation} \hat{\boldsymbol{\beta}} = \left(\tilde{\boldsymbol{X}}^T \tilde{\boldsymbol{X}}\right)^{-1} \tilde{\boldsymbol{X}}^T \tilde{\boldsymbol{Y}}. \end{equation}\]
We compute the estimated residuals as \(\hat{\boldsymbol{\varepsilon}} = {\tilde{\boldsymbol{Y}}} - \tilde{{\boldsymbol{X}}} \hat{\boldsymbol{\beta}}\).
We then estimate with the Maximum Likelihood Estimator the parameters \(\boldsymbol{\vartheta}\) of the missingness process \(\boldsymbol{Z}\) assuming that \(\boldsymbol{Z}\) is generated from a Markov model with the following transition probabilities:
\[\begin{equation} \label{eq:markov_model_def} \begin{aligned} & P\left( Z_2=1 \mid Z_1=1\right)=1 - p_1 \\ & P\left(Z_2=1 \mid Z_1=0\right) = p_2 \\ & P\left(Z_2=0 \mid Z_1=1\right)=p_1 \\ & P\left(Z_2=0 \mid Z_1=0\right)=1-p_2. \end{aligned} \end{equation}\]
We then estimate the parameters \(\boldsymbol{\gamma}\) using a Generalized method of Wavelet Moments approach (Guerrier et al. 2013) and using the fact that the variance-covariance matrix of \(\hat{\boldsymbol{\varepsilon}}\) is given by:
\[\boldsymbol{\Sigma}_{\hat{\boldsymbol{\varepsilon}}}(\boldsymbol{\gamma}) = [\boldsymbol{\Lambda}(\hat{\boldsymbol{\vartheta}}) + \mu(\hat{\boldsymbol{\vartheta}})^2 \mathbf{1} \mathbf{1}^T ] \odot ( \boldsymbol{I} - \boldsymbol{P})\boldsymbol{\Sigma}(\boldsymbol{\gamma}) (\boldsymbol{I} - \boldsymbol{P})\]
where \(\boldsymbol{P} = \boldsymbol{X}(\boldsymbol{X}^T\boldsymbol{X})^{-1}\boldsymbol{X}^T\) and \(\boldsymbol{I}\) is the identity matrix of dimension \(n\times n\).
More precisely, we rely on the result of Xu et al. (2017) that provide a computationally efficient form of the theoretical Allan variance (equivalent to the Haar wavelet variance up to a constant) for zero-mean stochastic processes such as \(\boldsymbol{\varepsilon}\) to avoid computing these large matrices multiplication in the objective function. Indeed in Xu et al. (2017) they generalize the results in Zhang (2008) to zero-mean non-stationary processes by using averages of the diagonals and super-diagonals of the covariance matrix of \(\boldsymbol{\varepsilon}\). What this implies is that the GMWM, which uses this form, does not require the storage of the \(n \times n\) covariance matrix of \(\boldsymbol{\varepsilon}\), but only of a vector of dimension \(n\) which is then plugged into an explicit formula consisting in a linear combination of the elements of this vector (these elements being averages of the diagonal and super-diagonals of the covariance matrix).
While the GMWMX as described above and in more details in Voirol et al. (2024), is a general method for
estimating large linear model with complex dependence structure in
presence of missing data, the gmwmx2
R
package
is currently developed specifically to estimate tectonic velocities from
position times series in graticule distance coordinates system (GD)
provided by the Nevada geodetic Laboratory (Blewitt 2024; Blewitt, Hammond, and Kreemer
2018).
To estimate the trajectory model (see e.g., Bevis and Brown (2014) for more details), we construct the design matrix \(\boldsymbol{X}\) such that \(i\)-th component of the vector \(\mathbf{X} {{\boldsymbol{\beta}}}\) can be described as follows with \(t_i\) representing the \(i^{th}\) ordered time point (epoch) indicated in Modified Julian Date and \(t_0\) representing the reference epoch located exactly in the middle of start and end point of the time series:
\[\begin{split} \mathbb{E}[\mathbf{Y}_i] &= \mathbf{X}_i^T {{\boldsymbol{\beta}}} \\ &= a + b \left(t_{i} - t_0\right) + \sum_{h=1}^{m}\left[c_{h} \sin \left(2 \pi f_{h} t_{i}\right) + d_h \cos \left(2 \pi f_h t_i\right)\right] + \\& \sum_{j=1}^{r}e_j H\left(t_i - t_j\right) + \sum_{k = 1 }^{s} l_k \left[1- \exp\left(\frac{-(t_i-t_k)}{\tau_k}\right)\right]H\left(t_{i}-\tau_k\right) \end{split}\]where \(a\) is the initial position at the reference epoch \(t_0\), \(b\) is the velocity parameter, \(c_h\) and \(d_h\) are the periodic motion parameters (\(h = 1\) and \(h = 2\) represent the annual and semi-annual seasonal terms, respectively with \(f_1 = \frac{1}{365.25}\) and \(f_2 = \frac{2}{365.25}\)). The offset terms models earthquakes, equipment changes or human intervention in which \(e_j\) is the magnitude of the step at epochs \(t_j\), \(r\) is the total number of offsets, \(H\) is the Heaviside step function defined as \(H(x):= \begin{cases}1, & x \geq 0 \\ 0, & x<0\end{cases}\) and the last term allow to model post-seismic deformation (see e.g., Sobrero et al. (2020)) where \(s\) is the number of post seismic relaxation time specified, \(t_k\) is the time when the relaxation \(k\) starts in Modified Julian Date (MJD), \(\tau_k\) is the relaxation period in days for the post-seismic relaxation \(k\) and \(l_k\) is the amplitude of the transient. Note that by default the estimates of the functional parameters are provided in unit/day.
When loading data from a specific station using the function
gmwmx2::download_station_ngl()
, we extract from the Nevada
Geodetic Laboratory the position time series in GD coordinates, the time
steps associated with a equipment or software change and the time steps
associated with an earthquake near the station. All these objects are
stored in a object of class gnss_ts_ngl
.
When applying the function gmwmx2::gmwmx2()
to an object
of class gnss_ts_ngl
, we construct the design matrix \(\boldsymbol{X}\) by considering an offset
term for all equipment or software changes steps and all earthquakes
indicated by the NGL. We also specify a post-seismic relaxation term for
all earthquakes indicated by the NGL. If no relaxation time is specified
in the argument vec_earthquakes_relaxation_time
, we
consider a default relaxation time of \(365.25\) days.
It is generally recognized that noise in GNSS time series is best described by a combination of colored noise plus white noise (He et al. 2017; Langbein 2008; Williams et al. 2004; Bos et al. 2013) where the white noise generally model noise at high frequencies and the colored noise model the lower frequencies of the spectrum. In a large study on the noise properties of GNSS signals, concluded that the optimal noise models for 80–90% of GNSS time series signals are the power law and white noise model or white noise and flicker/pink noise with model. The package currently support both stochastic model specification.
More precisely, the power spectrum of a power-law noise has the following form: \[ P(f)=P_0\left(f / f_s\right)^\kappa \] where \(f\) is the frequency, \(P_0\) is a constant, \(f_s\) the sampling frequency and the exponent \(\kappa\) is called the spectral index.
Many stochastic noise can be expressed as such, for example, a spectral index \(\kappa=0\) produces a white noise, a spectral index \(\kappa=-2\) produces a red noise or random walk and a spectral index \(\kappa=-1\) produce a flicker noise, also called pink noise.
Granger (1980) and Hosking (1981) showed that power-law noise with a spectral index between \(-1\) and \(1\) can be obtained by using fractional differencing of Gaussian noise:
\[ (1-B)^{-\kappa / 2} \mathbf{v} \]
where \(B\) is the backward-shift operator \(\left(B v_i=v_{i-1}\right)\) and \(\mathbf{v}\) a vector with independent and identically distributed (IID) Gaussian noise.
Following from Hosking’s definition of the fractional differencing, we obtain
\[ \begin{aligned} (1-B)^{-\kappa / 2} & =\sum_{i=0}^{\infty}\binom{-\kappa / 2}{i}(-B)^i \\ & =1-\frac{\kappa}{2} B-\frac{1}{2} \frac{\kappa}{2}\left(1-\frac{\kappa}{2}\right) B^2+\ldots \\ & =\sum_{i=0}^{\infty} h_i \end{aligned} \] with the coefficient \(h_i\) that can be computed using the following recurrence relation (Kasdin and Walter 1992):
\[ \begin{aligned} h_0 & =1 \\ h_i & =\left(i-\frac{\kappa}{2}-1\right) \frac{h_{i-1}}{i} \quad \text { for } i>0 \end{aligned} \]
Assuming an infinite sequence of zero-mean white noise \(\mathbf{v}\), with variance \(\sigma_{P L}^2\), and a spectral index \(kappa > -1\) then the autocovariance \(\gamma(\tau)=\operatorname{Cov}\left(X_t, X_{t+\tau}\right)=\mathbb{E}\left[\left(X_t-\mu\right)\left(X_{t+\tau}-\mu\right)\right]\) is (Bos et al. 2008):
\[
\begin{aligned}
& \gamma(0)=\sigma_{p l}^2
\frac{\Gamma(1-\alpha)}{\left(\Gamma\left(1-\frac{\alpha}{2}\right)\right)^2}
\\
&
\gamma(\tau)=\frac{\frac{\alpha}{2}+\tau-1}{-\frac{\alpha}{\alpha}+\tau}
\gamma(\tau-1) \text { for } \tau>0
\end{aligned}
\] When the argument stochastic_model
is set to
"wn + pl"
, the stochastic model considered includes both
white noise and power-law with the specified above stationary
autocovariance structure. The parameters estimated are: \(\sigma^2_{W N}\), \(\kappa\) (constrained to be greater than
\(-1\)) and \(\sigma^2_{P L}\).
When the argument stochastic_model
is set to
"wn + fl"
, the stochastic model considered includes both
white noise and flicker noise (not stationary power-law noise with a
spectral index \(\kappa=-1\)) where the
variance covariance of the flicker noise \(\omega\) is obtained as follows (see e.g.,
(Bos et al. 2008)):
\[ \operatorname{Cov}(\omega) = \sigma^2_{F L}\mathbf{U}^T \mathbf{U} \]
where \[ \mathbf{U}=\left(\begin{array}{cccc} h_0 & h_1 & \ldots & h_n \\ 0 & h_0 & & h_{n-1} \\ \vdots & & \ddots & \vdots \\ 0 & 0 & \ldots & h_0 \end{array}\right) \] with the coefficients \(h_i\) computed considering a spectral index \(\kappa=-1\).
The stochastic parameters estimated are: \(\sigma^2_{W N}\), \(\sigma^2_{F L}\) and \(\kappa\) is fixed to \(-1\).
Let us showcase how to estimate the tectonic velocity in for one specific component (North, East or Vertical) of one station.
Let us first load the gmwmx2
package.
## Summary of Estimated Model
## -------------------------------------------------------------
## Functional parameters
## -------------------------------------------------------------
## Parameter Estimate Std_Deviation 95% CI Lower 95% CI Upper
## -------------------------------------------------------------
## Intercept -0.70924596 0.00880076 -0.72649514 -0.69199677
## Trend 0.00007198 0.00000836 0.00005559 0.00008837
## Sin (Annual) 0.00135377 0.00034295 0.00068160 0.00202593
## Cos (Annual) -0.00062838 0.00036296 -0.00133976 0.00008300
## Sin (Semi-Annual) 0.00017180 0.00023908 -0.00029679 0.00064038
## Cos (Semi-Annual) 0.00030877 0.00024661 -0.00017457 0.00079211
## Jump: MJD 56248 0.01588617 0.00157098 0.01280710 0.01896524
## Jump: MJD 55391 0.00780478 0.00153943 0.00478754 0.01082201
## Jump: MJD 55563 0.00230763 0.00175248 -0.00112717 0.00574242
## Jump: MJD 55603 0.00286880 0.00237840 -0.00179277 0.00753037
## Jump: MJD 55606 -0.00044386 0.00351707 -0.00733719 0.00644947
## Jump: MJD 56011 -0.00008945 0.00166129 -0.00334552 0.00316662
## Jump: MJD 56085 0.00073436 0.00170679 -0.00261088 0.00407960
## Jump: MJD 56748 -0.00038578 0.00140095 -0.00313159 0.00236004
## Jump: MJD 57281 -0.00145742 0.00154361 -0.00448285 0.00156800
## Earthquake: MJD 55391 0.01576163 0.00744415 0.00117137 0.03035190
## Earthquake: MJD 55563 0.00174787 0.02577951 -0.04877903 0.05227478
## Earthquake: MJD 55603 -0.28371260 0.53155129 -1.32553398 0.75810879
## Earthquake: MJD 55606 0.27719703 0.52564189 -0.75304214 1.30743621
## Earthquake: MJD 56011 -0.00480858 0.01328379 -0.03084432 0.02122717
## Earthquake: MJD 56085 -0.00772227 0.01248180 -0.03218615 0.01674162
## Earthquake: MJD 56748 -0.00910404 0.00544289 -0.01977191 0.00156382
## Earthquake: MJD 57281 -0.02236505 0.00647710 -0.03505994 -0.00967016
## -------------------------------------------------------------
## Stochastic parameters
## -------------------------------------------------------------
## White Noise Variance : 0.00000122
## Stationary powerlaw Spectral index: -0.83882877
## Stationary powerlaw Variance: 0.00000337
## -------------------------------------------------------------
## Missingness parameters
## -------------------------------------------------------------
## P(Z_{i+1} = 0 | Z_{i} = 1): 0.00279460
## P(Z_{i+1} = 1 | Z_{i} = 0): 0.31578947
## \hat{E[Z]}: 0.99122807
## -------------------------------------------------------------
## Running time: 1.1 seconds
## -------------------------------------------------------------
By default, the estimated parameters are provided in m/day, we can
optionally scale the estimated functional parameters so that they are
returned in m/year with the argument scale_parameters
.
## Summary of Estimated Model
## -------------------------------------------------------------
## Functional parameters
## -------------------------------------------------------------
## Parameter Estimate Std_Deviation 95% CI Lower 95% CI Upper
## -------------------------------------------------------------
## Intercept -259.05208570 3.21447935 -265.35234945 -252.75182195
## Trend 0.02629027 0.00305374 0.02030506 0.03227549
## Sin (Annual) 0.49446304 0.12526192 0.24895419 0.73997189
## Cos (Annual) -0.22951487 0.13256987 -0.48934705 0.03031731
## Sin (Semi-Annual) 0.06274864 0.08732351 -0.10840228 0.23389957
## Cos (Semi-Annual) 0.11277837 0.09007298 -0.06376143 0.28931817
## Jump: MJD 56248 5.80242306 0.57380175 4.67779229 6.92705382
## Jump: MJD 55391 2.85069445 0.56227851 1.74864881 3.95274009
## Jump: MJD 55563 0.84286159 0.64009281 -0.41169727 2.09742045
## Jump: MJD 55603 1.04782942 0.86870919 -0.65480931 2.75046815
## Jump: MJD 55606 -0.16211875 1.28460983 -2.67990775 2.35567026
## Jump: MJD 56011 -0.03267118 0.60678682 -1.22195150 1.15660914
## Jump: MJD 56085 0.26822620 0.62340406 -0.95362331 1.49007571
## Jump: MJD 56748 -0.14090556 0.51169791 -1.14381504 0.86200392
## Jump: MJD 57281 -0.53232443 0.56380463 -1.63736120 0.57271233
## Earthquake: MJD 55391 5.75693615 2.71897531 0.42784247 11.08602982
## Earthquake: MJD 55563 0.63841028 9.41596471 -17.81654143 19.09336199
## Earthquake: MJD 55603 -103.62602673 194.14910865 -484.15128733 276.89923386
## Earthquake: MJD 55606 101.24621682 191.99070042 -275.04864137 477.54107500
## Earthquake: MJD 56011 -1.75633307 4.85190369 -11.26588956 7.75322342
## Earthquake: MJD 56085 -2.82055781 4.55897918 -11.75599281 6.11487720
## Earthquake: MJD 56748 -3.32525134 1.98801468 -7.22168852 0.57118584
## Earthquake: MJD 57281 -8.16883558 2.36576219 -12.80564427 -3.53202690
## -------------------------------------------------------------
## Stochastic parameters
## -------------------------------------------------------------
## White Noise Variance : 0.00000122
## Stationary powerlaw Spectral index: -0.83882877
## Stationary powerlaw Variance: 0.00000337
## -------------------------------------------------------------
## Missingness parameters
## -------------------------------------------------------------
## P(Z_{i+1} = 0 | Z_{i} = 1): 0.00279460
## P(Z_{i+1} = 1 | Z_{i} = 0): 0.31578947
## \hat{E[Z]}: 0.99122807
## -------------------------------------------------------------
## Running time: 1.1 seconds
## -------------------------------------------------------------
## Summary of Estimated Model
## -------------------------------------------------------------
## Functional parameters
## -------------------------------------------------------------
## Parameter Estimate Std_Deviation 95% CI Lower 95% CI Upper
## -------------------------------------------------------------
## Intercept -0.70924596 0.01136945 -0.73152967 -0.68696225
## Trend 0.00007198 0.00001083 0.00005074 0.00009321
## Sin (Annual) 0.00135377 0.00043414 0.00050287 0.00220467
## Cos (Annual) -0.00062838 0.00045622 -0.00152254 0.00026579
## Sin (Semi-Annual) 0.00017180 0.00027821 -0.00037348 0.00071707
## Cos (Semi-Annual) 0.00030877 0.00028861 -0.00025689 0.00087443
## Jump: MJD 56248 0.01588617 0.00191916 0.01212469 0.01964764
## Jump: MJD 55391 0.00780478 0.00182956 0.00421891 0.01139065
## Jump: MJD 55563 0.00230763 0.00190189 -0.00142001 0.00603527
## Jump: MJD 55603 0.00286880 0.00235741 -0.00175163 0.00748923
## Jump: MJD 55606 -0.00044386 0.00358112 -0.00746272 0.00657500
## Jump: MJD 56011 -0.00008945 0.00192695 -0.00386620 0.00368730
## Jump: MJD 56085 0.00073436 0.00199984 -0.00318524 0.00465397
## Jump: MJD 56748 -0.00038578 0.00187908 -0.00406870 0.00329714
## Jump: MJD 57281 -0.00145742 0.00187987 -0.00514189 0.00222704
## Earthquake: MJD 55391 0.01576163 0.00930876 -0.00248320 0.03400647
## Earthquake: MJD 55563 0.00174787 0.02766816 -0.05248072 0.05597647
## Earthquake: MJD 55603 -0.28371260 0.52755951 -1.31771024 0.75028504
## Earthquake: MJD 55606 0.27719703 0.52174625 -0.74540682 1.29980089
## Earthquake: MJD 56011 -0.00480858 0.01509158 -0.03438754 0.02477039
## Earthquake: MJD 56085 -0.00772227 0.01454627 -0.03623244 0.02078791
## Earthquake: MJD 56748 -0.00910404 0.00725836 -0.02333016 0.00512207
## Earthquake: MJD 57281 -0.02236505 0.00811536 -0.03827086 -0.00645925
## -------------------------------------------------------------
## Stochastic parameters
## -------------------------------------------------------------
## White Noise Variance : 0.00000193
## Flicker Noise Variance: 0.00000251
## -------------------------------------------------------------
## Missingness parameters
## -------------------------------------------------------------
## P(Z_{i+1} = 0 | Z_{i} = 1): 0.00279460
## P(Z_{i+1} = 1 | Z_{i} = 0): 0.31578947
## \hat{E[Z]}: 0.99122807
## -------------------------------------------------------------
## Running time: 2.8 seconds
## -------------------------------------------------------------
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.