| Version: | 0.0-8 |
| Date: | 2025-08-21 |
| Title: | Additions and Extensions of the 'VGAM' Package |
| Author: | Victor Miranda [aut, cre, cph], Thomas Yee [ctb, ths, cph] |
| Maintainer: | Victor Miranda <victor.miranda@aut.ac.nz> |
| Depends: | R (≥ 4.2.0), methods, stats, stats4, VGAM(≥ 1.1-1) |
| Suggests: | VGAMdata |
| Description: | Extending the functionalities of the 'VGAM' package with additional functions and datasets. At present, 'VGAMextra' comprises new family functions (ffs) to estimate several time series models by maximum likelihood using Fisher scoring, unlike popular packages in CRAN relying on optim(), including ARMA-GARCH-like models, the Order-(p, d, q) ARIMAX model (non- seasonal), the Order-(p) VAR model, error correction models for cointegrated time series, and ARMA-structures with Student-t errors. For independent data, new ffs to estimate the inverse- Weibull, the inverse-gamma, the generalized beta of the second kind and the general multivariate normal distributions are available. In addition, 'VGAMextra' incorporates new VGLM-links for the mean-function, and the quantile-function (as an alternative to ordinary quantile modelling) of several 1-parameter distributions, that are compatible with the class of VGLM/VGAM family functions. Currently, only fixed-effects models are implemented. All functions are subject to change; see the NEWS for further details on the latest changes. |
| License: | GPL-2 |
| NeedsCompilation: | yes |
| BuildVignettes: | yes |
| Packaged: | 2025-08-21 03:18:06 UTC; vico |
| Repository: | CRAN |
| Date/Publication: | 2025-08-21 05:30:02 UTC |
Additions and extensions of the VGAM package.
Description
VGAMextra supplies additional functions and methods to the package VGAM, under three main topics:
** Time series modelling. A novel class of VGLMs to model univariate time series, called vector generalized linear time series models (VGLTSMs). It is characterized by incorporating past information in the VGLM/VGAM loglikelihood.
** 1–parameter distribution mean modelling. We return full circle by developing new link functions for the mean of 1–parameter distributions. VGAMs, VGLMs and GAMLSSs are restricted to location, scale and shape, however, the VGLM/VGAM framework has infrastructure to accommodate new links as a function of the parameters.
** Quantile modelling of 1–parameter distributions. Similarly, we have implemented link functions to model the quantiles of several 1–parameter distributions, for quantile regression.
Details
The inference infrastructure of VGAMextra relies on the VGLM/VGAM framework. Particularly, estimation is carried out via IRLS using Fisher scoring, whilst additive models and reduced rank regression are also accommodated by all VGAMextra family functions.
At present, this package
allows the extent of VGLMs/VGAMs to operate
popular time series models as special cases, as well as
cointegrated time series (bivariate case), and
modelling choices for volatility models incorporating
explanatories in the variance equation.
The central family functions in this respect are
ARXff,
MAXff,
ARMAX.GARCHff, and
VGLM.INGARCHff.
Regarding modelling the mean/quantile-functions, VGAMextra
affords links for several 1-parameter distributions,
e.g., expMlink,
benini1Qlink, or
inv.chisqMlink.
Collectively, the quantile-links represent an alternative
to quantile regression by directly modelling the quantile
function for distributions
beyond the exponential family (See Example 3 below).
The VGLM/VGAM framework is very large and encompasses a wide range of
multivariate response types and models, e.g., it includes
univariate and multivariate distributions,
categorical data analysis, and extreme values.
See VGAM-package for a broad
description and further information on VGAM.
Future work
* Implement VGLM time series family functions to handle error correction models (ECMs) for cointegrated time series. Upgrade this framework beyond the bivariate case, e.g., the the Vector ECM (VECMs).
* Upgrade VGLMs time series family functions to handle multivariate time series, e.g., the VAR model (Coming shortly).
* Incorporate VGLM/VGAM-links to model the mean and quantile functions of distributions with > 1 parameters.
* Develop the class of multiple reduced–rank VGLMs towards time series, to handle, e.g., vector error correction models (VECMs), for multiple cointegrated time series.
Warning
VGAMextra is revised, altered, and/or upgraded on a regular basis.
Hence, be aware that any feature, e.g., function names, arguments, or
methods, may be modified without prior notice.
Check the NEWS for the latest changes and additions across
the different versions.
Author(s)
Victor Miranda, victor.miranda@aut.ac.nz.
Maintainer: Victor Miranda, victor.miranda@aut.ac.nz.
Contributor: Thomas Yee, t.yee@auckland.ac.nz.
References
Pfaff, B. (2011) Analysis of Integrated and Cointegrated Time Series with R. Seattle, Washington, USA: Springer.
Chan, N., Li, D., Peng, L. and Zhang, R. (2013) Tail index of an AR(1) model with ARCH(1) errors. Econometric Theory, 29(5):920-940.
Yee, T. W. (2015) Vector Generalized Linear and Additive Models: With an Implementation in R. New York, USA: Springer.
Miranda, V. and Yee, T. W. Vector generalized linear time series models. In preparation.
Miranda, V. and Yee, T. W. On mean modelling of 1-parameter distributions using vector generalized linear models. In preparation
Miranda, V. and Yee, T. W. Two–Parameter Link Functions with Applications to Negative Binomail, Weibull, and Quantile Regression. To be submitted to the Scandinavian Journal of Statistics.
Miranda, V. and Yee, T.W. New Link Functions for Distribution-Specific Quantile Regression Based on Vector Generalized Linear and Additive Models. Journal of Probability and Statistics, Volume 2019, Article ID 3493628.
Yee, T. W. (2008)
The VGAM Package.
R News, 8, 28–39.
See Also
vglm,
vgam,
rrvglm,
Links,
CommonVGAMffArguments,
https://www.stat.auckland.ac.nz/~vmir178/.
Examples
##### EXAMPLE 1. An AR(1) model with ARCH(1) errors.
# Chan et.al. (2013) proposed a long and technical methodology to
# estimate the tail index of an AR(1) with ARCH(1) errors involving
# its estimation by QMLE. I fit this model straightforwardly by MLE
# using the family function ARXff() for time series, and constraining
# the effect of Y^2_{t - 1} to the conditional variance using
# constraint matrices.
# Generate some data
set.seed(1)
nn <- ceiling(runif(1, 150, 160))
my.rho <- rhobitlink(-1.0, inverse = TRUE) # -0.46212
my.mu <- 0.0
my.omega <- 1
my.b <- 0.5
tsdata <- data.frame(x2 = sort(runif(n = nn)))
tsdata <- transform(tsdata, index = 1:nn, TS1 = runif(nn))
for (ii in 2:nn)
tsdata$TS1[ii] <- my.mu + my.rho * tsdata$TS1[ii-1] +
sqrt(my.omega + my.b * (tsdata$TS1[ii-1])^2) * rnorm(1)
# Remove the burn-in data:
nnr <- ceiling(nn/5)
tsdata <- tsdata[-(1:nnr), ]
tsdata["index"] <- 1:(nn - nnr)
# The timeplot.
with(tsdata, plot(ts(TS1), lty = "solid", col = "blue", xlab ="Time", ylab = "Series"))
abline(h = mean(tsdata[, "TS1"]), lty = "dotted")
# The constraint matrices, to inhibit the effect of Y^2_{t - 1}
# over sigma^2 only.
const.mat <- list('(Intercept)' = diag(3), 'TS1l1sq' = cbind(c(0, 1, 0)))
# Set up the data using function WN.lags() from VGAMextra to generate
# our 'explanatory'
tsdata <- transform(tsdata, TS1l1sq = WN.lags(y = cbind(tsdata[, "TS1"])^2, lags = 1))
# Fitting the model
fit.Chan.etal <- vglm(TS1 ~ TS1l1sq, ARXff(order = 1, # AR order
zero = NULL, noChecks = FALSE,
var.arg = TRUE, lvar = "identitylink"),
crit = "loglikelihood", trace = TRUE,
constraints = const.mat, data = tsdata) ## Constraints...
summary(fit.Chan.etal, lrt0 = TRUE, score0 = TRUE, wald0 = TRUE)
constraints(fit.Chan.etal)
###### EXAMPLE 2. VGLMs handling cointegrated (bivariate) time series.
# In this example, vglm() accommodates an error correction model
# of order (2, 2) to fit two (non-stationary) cointegrated time series.
# Simulating some data.
set.seed(2017081901)
nn <- 280
rho <- 0.75
s2u <- exp(log(1.5)) # Gaussian noise1
s2w <- exp(0) # Gaussian noise2
my.errors <- rbinorm(nn, mean1 = 0, mean2 = 0, var1 = s2u, var2 = s2w, cov12 = rho)
ut <- my.errors[, 1]
wt <- my.errors[, 2]
yt <- xt <- numeric(0)
xt[1] <- ut[1] # Initial value: error.u[0]
yt[1] <- wt[1] # Initial value: error.w[0]
beta <- c(0.0, 2.5, -0.32) # Coefficients true values.
for (ii in 2:nn) {
xt[ii] <- xt[ii - 1] + ut[ii]
yt[ii] <- beta[1] + beta[2] * xt[ii] + beta[3] * yt[ii - 1] + wt[ii]
}
# Regression of yt on xt, save residuals. Compute Order--1 differences.
errors.coint <- residuals(lm(yt ~ xt)) # Residuals from the static regression yt ~ xt
difx1 <- diff(ts(xt), lag = 1, differences = 1) # First difference for xt
dify1 <- diff(ts(yt), lag = 1, differences = 1) # First difference for yt
# Set up the dataset (coint.data), including Order-2 lagged differences.
coint.data <- data.frame(embed(difx1, 3), embed(dify1, 3))
colnames(coint.data) <- c("difx1", "difxLag1", "difxLag2",
"dify1", "difyLag1", "difyLag2")
# Remove unutilized lagged errors accordingly. Here, use from t = 3.
errors.cointLag1 <- errors.coint[-c(1:2, nn)]
coint.data <- transform(coint.data, errors.cointLag1 = errors.cointLag1)
# Plotting the data
plot(ts(yt[-c(1:3, NULL)]), lty = 4, type = "l", col = "red",
main = "", xlab = "Time", las = 1, ylim = c(-32, 20),
ylab = expression("Series"~x[t]~"and"~y[t]))
lines(ts(xt[-c(1:3, NULL)]), lty = 4, type = "l", col = "blue")
legend("bottomleft", c(expression("Series"~x[t]),
expression("Series"~y[t])),
col = c("red", "blue"), lty = c(4, 4))
# Fitting an error correction model (2, 2), aka ECM(2, 2)
fit.coint <- vglm(cbind(dify1, difx1) ~ errors.cointLag1 + difxLag1 + difyLag1 +
difxLag2 + difyLag2,
binormal(zero = c("sd", "rho")), # 'sigma', 'rho' are intercept--only.
trace = FALSE, data = coint.data)
summary(fit.coint)
coef(fit.coint, matrix = TRUE)
##### EXAMPLE 3. Quantile Modelling (QM).
# Here, the quantile function of the Maxwell distribution is modelled
# for percentiles 25%, 50% and 75%. The resulting quantile-curves
# are plotted. The rate parameter is determined by an artificial covariate.
set.seed(123)
# An artificial covariate.
maxdata <- data.frame(x2 = sort(runif(n <- nn <- 120)))
# The 'rate' function.
mymu <- function(x) exp(2 - 6 * sin(2 * x - 0.2) / (x + 0.5)^2)
# Set up the data.
maxdata <- transform(maxdata, y = rmaxwell(n, rate = mymu(x2)))
# 25%, 50% and 75% quantiles are to be modelled.
mytau <- c(0.25, 0.50, 0.75)
mydof <- 4
### Using B-splines with 'mydof'-degrees of freedom on the predictors
fit.QM <- vglm(Q.reg(y, pvector = mytau) ~ bs(x2, df = mydof),
family = maxwell(link = "maxwellQlink(p = mytau)",
zero = NULL),
data = maxdata, trace = TRUE)
summary(fit.QM, lscore0 = TRUE)
head(predictors(fit.QM)) # The 'fitted values'
## The 25%, 50%, and 75% quantile curves.
mylwd <- 1.5
with(maxdata, plot(x2, y, col = "orange",
main = "Example 1; Quantile Modelling",
ylab = "y", pch = "o", cex = 0.75))
with(maxdata, matlines(x2, predict(fit.QM)[, 1], col = "blue",
lty = "dotted", lwd = mylwd))
with(maxdata, matlines(x2, predict(fit.QM)[, 2], col = "chocolate",
lty = "dotted", lwd = mylwd))
with(maxdata, matlines(x2, predict(fit.QM)[, 3], col = "brown",
lty = "dotted", lwd = mylwd))
legend("topleft", c("percentile25", "percentile50", "percentile75"),
lty = rep("dotted", 3), lwd = rep(mylwd, 3))
### Double check: The data (in percentage) below the 25%, 50%, and 75% curves
round(length(predict(fit.QM)[, 1][(maxdata$y
<= predict(fit.QM)[, 1] )]) /nn, 5) * 100 ## Should be 25% approx
round(length(predict(fit.QM)[, 2][(maxdata$y
<= predict(fit.QM)[, 2] )]) /nn, 5) * 100 ## Should be 50% approx
round(length(predict(fit.QM)[, 3][(maxdata$y
<= predict(fit.QM)[, 3] )]) /nn, 5) * 100 ## Should be 75% approx
VGLTSMs Family Functions:
Generalized integrated regression
with order–(p, q) ARMA errors
Description
A VLTSMff for dynamic regression.
Estimates regression models with order–(p, d, q) ARIMA
errors by maximum likelihood.
Usage
ARIMAX.errors.ff(order = c(1, 1, 1),
zero = "var", # optionally, "mean".
order.trend = 0,
include.int = TRUE,
diffCovs = TRUE,
xLag = 0,
include.currentX = TRUE,
lvar = "loglink",
lmean = "identitylink")
Arguments
order |
The usual |
zero |
What linear predictor is modelled as intercept–only?
See |
order.trend |
Non–negative integer. Allows to incorporate a polynomial
trend of order |
include.int |
Logical. Should an intercept (int) be included in
the model for |
diffCovs |
Logical. If |
xLag |
Integer. If entered, the covariates, say
|
include.currentX |
Logical. If |
lvar, lmean |
Link functions applied to conditional mean and the variance.
Same as |
Details
The generalized linear regression model with ARIMA errors is another subclass of VGLTSMs (Miranda and Yee, 2018).
For a univariate time series, say y_t,
and a p–dimensional vector of covariates
\boldsymbol{x}_t covariates,
the model described by this VGLTSM family function is
y_t = \boldsymbol{\beta}^T \boldsymbol{x}_t + u_t,
u_t = \theta_1 u_{t - 1} + \cdots + \theta_p u_{t - p} + z_t +
\phi_1 z_{t - 1} + \cdots + \phi_1 z_{t - q}.
The first entry in x_t equals 1, allowing
an intercept, for every $t$. Set
include.int = FALSE to set this to zero,
dimissing the intercept.
Also, if diffCovs = TRUE, then the differences up to order
d of the set
\boldsymbol{x}_t are embedded in the model
for y_t.
If xLag> 0, the lagged values up to order
xLag of the covariates are also included.
The random disturbances z_t are by default
handled as N(0, \sigma^2_z). Then,
denoting \Phi_{t} as the history of the
process (x_{t + 1}, u_t) up to time t, yields
E(y_t | \Phi_{t - 1}) = \boldsymbol{\beta}^T \boldsymbol{x}_t +
\theta_1 u_{t - 1} + \cdots + \theta_p u_{t - p} +
\phi_1 z_{t - 1} + \cdots + \phi_1 z_{t - q}.
Denoting \mu_t = E(y_t | \Phi_{t - 1}),
the default linear predictor for this VGLTSM family function is
\boldsymbol{\eta} = ( \mu_t, \log \sigma^2_{z})^T.
Value
An object of class "vglmff"
(see vglmff-class)
to be used by VGLM/VGAM modelling functions, e.g.,
vglm or vgam.
Note
If d = 0 in order, then ARIMAX.errors.ff
will perform as ARIMAXff.
Author(s)
Victor Miranda
See Also
ARIMAXff,
CommonVGAMffArguments,
uninormal,
vglm.
Examples
### Estimate a regression model with ARMA(1, 1) errors.
## Covariates are included up to lag 1.
set.seed(20171123)
nn <- 250
x2 <- rnorm(nn) # One covariate
sigma2 <- exp(1.15); theta1 <- 0.5; phi1 <- 0.27 # True coefficients
beta0 <- 1.25; beta1 <- 0.25; beta2 <- 0.5
y <- numeric(nn)
u <- numeric(nn)
z <- numeric(nn)
u[1] <- rnorm(1)
z[1] <- rnorm(1, 0, sqrt(sigma2))
for(ii in 2:nn) {
z[ii] <- rnorm(1, 0, sqrt(sigma2))
u[ii] <- theta1 * u[ii - 1] + phi1 * z[ii - 1] + z[ii]
y[ii] <- beta0 + beta1 * x2[ii] + beta2 * x2[ii - 1] + u[ii]
}
# Remove warm-up values.
x2 <- x2[-c(1:100)]
y <- y[-c(1:100)]
plot(ts(y), lty = 2, col = "blue", type = "b")
abline(h = 0, lty = 2)
## Fit the model.
ARIMAX.reg.fit <- vglm(y ~ x2, ARIMAX.errors.ff(order = c(1, 0, 1), xLag = 1),
data = data.frame(y = y, x2 = x2), trace = TRUE)
coef(ARIMAX.reg.fit, matrix = TRUE)
summary(ARIMAX.reg.fit, HD = FALSE)
# Compare to arima()
# arima() can't handle lagged values of 'x2' by default, but these
# may entered at argument 'xreg'.
arima(y, order = c(1, 0, 1), xreg = cbind(x2, c(0, x2[-150])))
VGLTSMs Family functions:
The Order–(p, d, q) Autoregressive Integrated Moving
Average Model (ARIMA(p, d, q)) with covariates
Description
Maximum likelihood estimation of the drift,
standard deviation or variance of the random noise,
and coefficients of an autoregressive integrated
moving average process of order-(p, d, q)
with covariates by MLE using Fisher scoring.
No seasonal terms handled yet. No seasonal components
handled yet.
Usage
ARIMAXff(order = c(1, 1, 0),
zero = c("ARcoeff", "MAcoeff"),
diffCovs = TRUE,
xLag = 0,
include.current = FALSE,
type.EIM = c("exact", "approximate")[1],
var.arg = TRUE,
nodrift = FALSE,
noChecks = FALSE,
ldrift = "identitylink",
lsd = "loglink",
lvar = "loglink",
lARcoeff = "identitylink",
lMAcoeff = "identitylink",
idrift = NULL,
isd = NULL,
ivar = NULL,
iARcoeff = NULL,
iMAcoeff = NULL)
Arguments
order |
Integer vector with three components,
( |
zero |
Integer or character–strings vector.
Name(s) or position(s) of the parameters/linear predictors
to be modeled as intercept-only. Details at
|
diffCovs |
Logical. The default is |
xLag |
Integer, non–negative. If |
include.current |
Logical. Same as
|
type.EIM |
The type of expected information matrix (EIM) of the ARMA process
to be utilized in Fisher scoring.
|
var.arg |
Logical. If |
nodrift |
Logical. |
noChecks |
Logical. If |
ldrift, lsd, lvar, lARcoeff, lMAcoeff |
Link functions applied to the intercept, the random noise standard deviation (or optionally, the variance), and the coefficients in the ARMA–type conditional–mean model. |
idrift, isd, ivar, iARcoeff, iMAcoeff |
Optional initial values for the intercept (drift), noise SD
(or variance), and ARMA coeffcients (a vector of length |
Details
Let \boldsymbol{x}_t be a (probably time–varying) vector of
suitable covariates. The ARIMAX model handled by ARIMAXff is
\nabla^d Y_t = \mu^{\star} + \boldsymbol{\beta}^T
\nabla^d \boldsymbol{x}_t +
\theta_1 \nabla^d Y_{t - 1} + \cdots +
\theta_p \nabla^d Y_{t - p} +
\phi_1 \varepsilon_{t - 1} + \cdots +
\phi_q \varepsilon_{t - q} + \varepsilon,
with \nabla^d (\cdot) the operator differencing a
series d times. If diffCovs = TRUE, this function
differencing the covariates d times too.
Similarly, ARMAXff manages
\nabla^d Y_t = \mu^{\star} + \boldsymbol{\beta}^T \boldsymbol{x}_t +
\theta_1 Y_{t - 1} + \cdots + \theta_p Y_{t - p} +
\phi_1 \varepsilon_{t - 1} + \cdots +
\phi_q \varepsilon_{t - q} + \varepsilon,
where
\varepsilon_{t | \Phi_{t - 1}}
\sim N(0, \sigma_{\varepsilon_t | \Phi_{t - 1}}^2).
Note, \sigma_{\varepsilon | \Phi_{t - 1}}^2 is conditional on
\Phi_{t - 1},
the information
of the joint process
\left(Y_{t - 1}, \boldsymbol{x}_t \right),
at time t, and hence may be
modelled in terms of \boldsymbol{x}_t,
if required.
ARIMAXff() and ARMAXff()
handle multiple responses, thus a
matrix can be used as the response.
Note, no seasonal terms handled. This feature is to be
incorporated shortly.
The default linear predictor is
\boldsymbol{\eta} = \left(
\mu, \log \sigma^{2}_{\varepsilon_{t | \Phi_{t - 1}}},
\theta_1, \ldots, \theta_p,
\phi_1, \ldots, \phi_q
\right)^T.
Other links are also handled. See Links.
Further choices for the random noise, besides Gaussian, will be implemented over time.
As with ARXff
and MAXff,
choices for the EIMs are "exact" and "approximate".
Covariates may be incorporated in the fit for any linear
predictor above. Hence, ARIMAXff supports non–stationary
processes (\sigma_{\varepsilon_t | \Phi_{t - 1}}^2) may depend on
\boldsymbol{X}_t.
Also, constraint matrices
on the linear predictors may be
entered through cm.ARMA or
using the argument constraints, from
vglm.
Checks on stationarity and
invertibility on the esitmated process are performed by default.
Set noChecks = TRUE to dismiss this step.
Value
An object of class "vglmff"
(see vglmff-class)
to be used by VGLM/VGAM modelling functions, e.g.,
vglm or vgam.
Warning
zero can be a numeric or a character–strings
vector specifying the position(s) or the name(s) of
the parameter(s) modeled as intercept–only.
Numeric values can be set as
usual (See CommonVGAMffArguments).
If names are entered, the parameter names in this
family function are:
c("drift.mean", "noiseVar" || "noiseSD", "ARcoeff", "MAcoeff").
Manually modify this if required. For simplicity, the second choice is recommended.
Note
No seasonal components handled yet.
If no covariates, \boldsymbol{x}_t,
are incorporated in the analysis,
then ARIMAXff fits an ordinary ARIMA model.
Ditto with ARMAXff.
If nodrift = TRUE, then the 'drift' is removed from the
vector of parameters and is not estimated.
By default, an ARMA model of order–c(1, 0) with
order–1 differences is fitted. When initial
values are entered (isd, iARcoeff, etc.),
they are recycled
according to the number of responses.
Also, the ARMA coefficients
are intercept–only (note, zero = c("ARcoeff",
"MAcoeff"))
This may altered via zero, or by
constraint matrices (See constraints)
using cm.ARMA.
Checks on stationarity and/or
invertibility can be manually via
checkTS.VGAMextra.
Author(s)
Victor Miranda and T. W. Yee
References
Miranda, V. and Yee, T.W. (2018) Vector Generalized Linear Time Series Models. In preparation.
Porat, B., and Friedlander, B. (1986) Computation of the Exact Information Matrix of Gaussian Time Series with Stationary Random Components. IEEE Transactions on Acoustics, Speech and Signal Processing. ASSp-34(1), 118–130.
See Also
ARXff,
MAXff,
checkTS.VGAMextra,
cm.ARMA,
CommonVGAMffArguments,
constraints,
vglm.
Examples
set.seed(3)
nn <- 90
theta <- c(0.12, 0.17) # AR coefficients
phi <- c(-0.15, 0.20) # MA coefficients.
sdWNN <- exp(1.0) # SDs
mu <- c(1.25, 0.85) # Mean (not drift) of the process.
covX <- runif(nn + 1) # A single covariate.
mux3 <- mu[1] + covX
##
## Simulate ARMA processes. Here, the drift for 'tsd3' depends on covX.
##
tsdata <- data.frame(TS1 = mu[1] + arima.sim(model = list(ar = theta, ma = phi,
order = c(2, 1, 2)), n = nn, sd = sdWNN ),
TS2 = mu[2] + arima.sim(model = list(ar = theta, ma = phi,
order = c(2, 1, 2)), n = nn, sd = exp(2 + covX)),
TS3 = mux3 + arima.sim(model = list(ar = theta, ma = phi,
order = c(2, 1, 2)), n = nn, sd = exp(2 + covX) ),
x2 = covX)
### EXAMPLE 1. Fitting a simple ARIMA(2, 1, 2) using vglm().
# Note that no covariates involved.
## Not run:
fit.ARIMA1 <- vglm(TS1 ~ 1, ARIMAXff(order = c(2, 1, 2), var.arg = FALSE,
# OPTIONAL INITIAL VALUES
# idrift = c(1.5)*(1 - sum(theta)),
# ivar = exp(4), isd = exp(2),
# iARcoeff = c(0.20, -0.3, 0.1),
# iMAcoeff = c(0.25, 0.35, 0.1),
type.EIM = "exact"),
data = tsdata, trace = TRUE, crit = "log")
Coef(fit.ARIMA1)
summary(fit.ARIMA1)
vcov(fit.ARIMA1, untransform = TRUE)
#------------------------------------------------------------------------#
# Fitting same model using arima().
#------------------------------------------------------------------------#
# COMPARE to EXAMPLE1
( fitArima <- arima(tsdata$TS1, order = c(2, 1, 2)) )
### EXAMPLE 2. Here only the ARMA coefficients and drift are intercept-only.
# The random noise variance is not constant.
fit.ARIMA2 <- vglm(TS2 ~ x2, ARIMAXff(order = c(2, 1, 2), var.arg = TRUE,
lARcoeff = "rhobitlink", lMAcoeff = "identitylink",
type.EIM = c("exact", "approximate")[1],
# NOTE THE ZERO ARGUMENT.
zero = c("drift.mean", "ARcoeff", "MAcoeff")),
data = tsdata, trace = TRUE)
coef(fit.ARIMA2, matrix = TRUE)
summary(fit.ARIMA2)
constraints(fit.ARIMA2)
### EXAMPLE 3. Here only ARMA coefficients are intercept-only.
# The random noise variance is not constant.
# Note that the "drift" and the "variance" are "generated" in
# terms of 'x2' above for TS3.
fit.ARIMA3 <- vglm(TS3 ~ x2, ARIMAXff(order = c(1, 1, 2), var.arg = TRUE,
lARcoeff = "identitylink", lMAcoeff = "identitylink",
type.EIM = c("exact", "approximate")[1], nodrift = FALSE,
zero = c( "ARcoeff", "MAcoeff")), # NOTE THE ZERO ARGUMENT.
data = tsdata, trace = TRUE)
coef(fit.ARIMA3, matrix = TRUE)
summary(fit.ARIMA3)
constraints(fit.ARIMA3)
## End(Not run)
VGLTSMs Family Functions: Generalized autoregressive moving average model with Student-t errors
Description
For an ARMA model, estimates a 3–parameter
Student-t distribution characterizing the errors
plus the ARMA coefficients by MLE usign Fisher
scoring. Central Student–t handled currently.
Usage
ARMA.studentt.ff(order = c(1, 0),
zero = c("scale", "df"),
cov.Reg = FALSE,
llocation = "identitylink",
lscale = "loglink",
ldf = "logloglink",
ilocation = NULL,
iscale = NULL,
idf = NULL)
Arguments
order |
Two–entries vector, non–negative. The order $u$ and $v$ of the ARMA model. |
zero |
Same as |
cov.Reg |
Logical. If covariates are entered, Should these be
included in the ARMA model as a |
llocation, lscale, ldf, ilocation, iscale, idf |
Same as |
Details
The normality assumption for time series analysis is relaxed to handle
heavy–tailed data, giving place to the ARMA model with shift-scaled
Student-t errors, another subclass of VGLTSMs.
For a univariate time series, say y_t,
the model described by this VGLTSM family function is
\theta(B)y_t = \phi(B) \varepsilon_t,
where \varepsilon_t are distributed as
a shift-scaled Student–t
with \nu degrees of freedom, i.e.,
\varepsilon_t \sim t(\nu_t, \mu_t, \sigma_t).
This family functions estimates the location (\mu_t),
scale (\sigma_t) and degrees of freedom (\nu_t)
parameters, plus the ARMA coefficients by MLE.
Currently only centered Student–t distributions are handled. Hence, the non–centrality parameter is set to zero.
The linear/additive predictors are
\boldsymbol{\eta} = (\mu, \log \sigma, \log \log \nu)^T,
where \log \sigma and \nu are intercept–only
by default.
Value
An object of class "vglmff"
(see vglmff-class)
to be used by VGLM/VGAM modelling functions, e.g.,
vglm or vgam.
Note
If order = 0, then AR.studentt.ff
fits a usual 3–parameter Student–t, as with
studentt3.
If covariates are incorporated in the analysis,
these are embedded in the location–parameter model.
Modify this through zero.
See CommonVGAMffArguments
for details on zero.
Author(s)
Victor Miranda
See Also
Examples
### Estimate the parameters of the errors distribution for an
## AR(1) model. Sample size = 50
set.seed(20180218)
nn <- 250
y <- numeric(nn)
ncp <- 0 # Non--centrality parameter
nu <- 3.5 # Degrees of freedom.
theta <- 0.45 # AR coefficient
res <- numeric(250) # Vector of residuals.
y[1] <- rt(1, df = nu, ncp = ncp)
for (ii in 2:nn) {
res[ii] <- rt(1, df = nu, ncp = ncp)
y[ii] <- theta * y[ii - 1] + res[ii]
}
# Remove warm up values.
y <- y[-c(1:200)]
res <- res[-c(1:200)]
### Fitting an ARMA(1, 0) with Student-t errors.
AR.stut.er.fit <- vglm(y ~ 1, ARMA.studentt.ff(order = c(1, 0)),
data = data.frame(y = y), trace = TRUE)
summary(AR.stut.er.fit)
Coef(AR.stut.er.fit)
plot(ts(y), col = "red", lty = 1, ylim = c(-6, 6), main = "Plot of series Y with Student-t errors")
lines(ts(fitted.values(AR.stut.er.fit)), col = "blue", lty = 2)
abline( h = 0, lty = 2)
VGLTSMs family functions: Order–p Autoregressive Model with covariates
Description
Maximum likelihood estimation of the order–p autoregressive model (AR(p)) with covariates. Estimates the drift, standard deviation (or variance) of the random noise (not necessarily constant), and coefficients of the conditional–mean model.
Usage
ARXff(order = 1,
zero = c(if (nodrift) NULL else "ARdrift", "ARcoeff"),
xLag = 0,
type.EIM = c("exact", "approximate")[1],
var.arg = TRUE,
nodrift = FALSE,
noChecks = FALSE,
ldrift = "identitylink",
lsd = "loglink",
lvar = "loglink",
lARcoeff = "identitylink",
idrift = NULL,
isd = NULL,
ivar = NULL,
iARcoeff = NULL)
Arguments
order |
The order (i.e., 'p') of the AR model, which is recycled if needed.
See below for further details.
By default, an autoregressive model of order- |
zero |
Integer or character–strings vector.
Name(s) or position(s) of the parameters/linear predictors
to be modeled as intercept-only. Details at
|
xLag |
Same as |
type.EIM, var.arg, nodrift, noChecks |
Same as |
ldrift, lsd, lvar, lARcoeff |
Link functions applied to the drift,
the standar deviation (or variance) of the noise, and the
AR coefficients.
Same as Further details on
|
idrift, isd, ivar, iARcoeff |
Same as |
Details
This family function describes an autoregressive model of order-p
with covariates (ARX(p)). It is a special case of the subclass VGLM–ARIMA
(Miranda and Yee, 2018):
Y_t | \Phi_{t - 1} = \mu_t + \theta_{1} Y_{t - 1} + \ldots +
\theta_p Y_{t - p} + \varepsilon_t,
where \boldsymbol{x}_t
a (possibly time–varying) covariate vector and
\mu_t = \mu^{\star} + \boldsymbol{\beta}^T \boldsymbol{x}_t is a (time–dependent) scaled–mean, known as drift.
At this stage, conditional Gaussian white noise,
\varepsilon_t| \Phi_{t - 1}
is handled, in the form
\varepsilon_t | \Phi_{t - 1} \sim
N(0, \sigma^2_{\varepsilon_t | \Phi_{t - 1}}).
The distributional assumptions on the observations are then
Y_t | \Phi_{t - 1} \sim
N(\mu_{t | \Phi_{t - 1}},
\sigma^2_{\varepsilon_t | \Phi_{t - 1}}),
involving the conditional mean equation for the ARX(p) model:
\mu_{t | \Phi_{t - 1}} = \mu_t +
\boldsymbol{\beta}^T * \boldsymbol{x}_t
\theta_{1} Y_{t - 1} + \ldots +
\theta_p Y_{t - p}.
\Phi_{t} denotes the information
of the joint process
\left(Y_{t}, \boldsymbol{x}_{t + 1}^T \right),
at time t.
The loglikelihood is computed by dARp,
at each Fisher scoring iteration.
The linear predictor is
\boldsymbol{\eta} = \left(
\mu_t, \log \sigma^{2}_{\varepsilon_{t | \Phi_{t - 1}}},
\theta_1, \ldots, \theta_p
\right)^T.
Note, the covariates may also intervene in the conditional
variance model
\log \sigma^{2}_{\varepsilon_{t | \Phi_{t - 1}}}. Hence, this family function
does not restrict the noise to be strictly white noise
(in the sense of constant variance).
The unconditional mean,
E(Y_{t}) = \mu, satisfies
\mu \rightarrow \frac{\mu^{\star}}{1 - (\theta_1 +
\ldots + \theta_p)}
when the process is stationary, and no covariates are involved.
This family function currently handles
multiple responses so that a matrix can be used as the response.
Also, for further details on VGLM/VGAM–link functions refer to
Links.
Further choices for the random noise, besides Gaussian, will be implemented over time.
Value
An object of class "vglmff"
(see vglmff-class). The
object is used by VGLM/VGAM modelling functions, such as
vglm or vgam.
Note
zero can be either an integer vector
or a vector of character strings
specifying either the position(s) or name(s) (partially or not) of the
parameter(s) modeled as intercept-only. Numeric values can be set as usual
(See CommonVGAMffArguments).
Character strings can be entered as per
parameter names in this family function, given by:
c("drift", "noiseVar" or "noiseSD", "ARcoeff").
Users can modify the zero argument according to their needs.
By default, \mu_t and the coefficients
\theta_1, \ldots, \theta_p
are intercept–only. That is,
\log \sigma^{2}_{\varepsilon_{t | \Phi_{t - 1}}} is modelled in terms of any explanatories
entered in the formula.
Users, however, can modify this
according to their needs via
zero. For example, set the covariates in the
drift model, \mu_t.
In addition, specific constraints
for parameters are handled through the function
cm.ARMA.
If var.arg = TRUE, this family function estimates
\sigma_{\varepsilon_t | \Phi_{t - 1}}^2.
Else, the
\sigma_{\varepsilon_t | \Phi_{t - 1}}
estimate is returned.
For this family function the order is recycled. That is,
order will be replicated up to the number of responses
given in the vglm call is matched.
Warning
Values of the estimates may not correspond
to stationary ARs, leading to low accuracy in the MLE estimates,
e.g., values very close to 1.0.
Stationarity is then examined, via
checkTS.VGAMextra,
if
noChecks = FALSE (default)
and no constraint matrices are set
(See constraints for further
details on this).
If the estimated model very close to be non-stationary, then
a warning will be outlined.
Set noChecks = TRUE to completely ignore this.
NOTE: Full details on these 'checks' are shown within the
summary() output.
Author(s)
Victor Miranda and T. W. Yee
References
Madsen, H. (2008) Time Series Analysis. Florida, USA: Chapman & Hall(Sections 5.3 and 5.5).
Porat, B., and Friedlander, B. (1986) Computation of the Exact Information Matrix of Gaussian Time Series with Stationary Random Components. IEEE Transactions on Acoustics, Speech and Signal Processing. ASSp-34(1), 118–130.
See Also
ARIMAXff,
ARMAXff,
MAXff,
checkTS.VGAMextra,
CommonVGAMffArguments,
Links,
vglm,
Examples
set.seed(1)
nn <- 150
tsdata <- data.frame(x2 = runif(nn)) # A single covariate.
theta1 <- 0.45; theta2 <- 0.31; theta3 <- 0.10 # Coefficients
drift <- c(1.3, -1.1) # Two responses.
sdAR <- c(sqrt(4.5), sqrt(6.0)) # Two responses.
# Generate AR sequences of order 2 and 3, under Gaussian noise.
# Note, the drift for 'TS2' depends on x2 !
tsdata <- data.frame(tsdata, TS1 = arima.sim(nn,
model = list(ar = c(theta1, theta1^2)), rand.gen = rnorm,
mean = drift[1], sd = sdAR[1]),
TS2 = arima.sim(nn,
model = list(ar = c(theta1, theta2, theta3)), rand.gen = rnorm,
mean = drift[2] + tsdata$x2 , sd = sdAR[2]))
# EXAMPLE 1. A simple AR(2), maximizing the exact log-likelihood
# Note that parameter constraints are involved for TS1, but not
# considered in this fit. "rhobitlink" is used as link for AR coeffs.
fit.Ex1 <- vglm(TS1 ~ 1, ARXff(order = 2, type.EIM = "exact",
#iARcoeff = c(0.3, 0.3, 0.3), # OPTIONAL INITIAL VALUES
# idrift = 1, ivar = 1.5, isd = sqrt(1.5),
lARcoeff = "rhobitlink"),
data = tsdata, trace = TRUE, crit = "loglikelihood")
Coef(fit.Ex1)
summary(fit.Ex1)
vcov(fit.Ex1, untransform = TRUE) # Conformable with this fit.
AIC(fit.Ex1)
#------------------------------------------------------------------------#
# Fitting same model using arima().
#------------------------------------------------------------------------#
(fitArima <- arima(tsdata$TS1, order = c(2, 0, 0)))
# Compare with 'fit.AR'. True are theta1 = 0.45; theta1^2 = 0.2025
Coef(fit.Ex1)[c(3, 4, 2)] # Coefficients estimated in 'fit.AR'
# EXAMPLE 2. An AR(3) over TS2, with one covariate affecting the drift only.
# This analysis makes sense as the TS2's drift is a function ox 'x2',
# i.e., 'x2' affects the 'drift' parameter only. The noise variance
# (var.arg = TRUE) is estimated, as intercept-only. See the 'zero' argument.
#------------------------------------------------------------------------#
# This model CANNOT be fitted using arima()
#------------------------------------------------------------------------#
fit.Ex2 <- vglm(TS2 ~ x2, ARXff(order = 3, zero = c("noiseVar", "ARcoeff"),
var.arg = TRUE),
## constraints = cm.ARMA(Model = ~ 1, lags.cm = 3, Resp = 1),
data = tsdata, trace = TRUE, crit = "log")
# True are theta1 <- 0.45; theta2 <- 0.31; theta3 <- 0.10
coef(fit.Ex2, matrix = TRUE)
summary(fit.Ex2)
vcov(fit.Ex2)
BIC(fit.Ex2)
constraints(fit.Ex2)
# EXAMPLE 3. Fitting an ARX(3) on two responses TS1, TS2; intercept-only model with
# constraints over the drifts. Here,
# a) No checks on invertibility performed given the use of cm.ARMA().
# b) Only the drifts are modeled in terms of 'x2'. Then, 'zero' is
# set correspondingly.
#------------------------------------------------------------------------#
# arima() does not handle this model.
#------------------------------------------------------------------------#
fit.Ex3 <- vglm(cbind(TS1, TS2) ~ x2, ARXff(order = c(3, 3),
zero = c("noiseVar", "ARcoeff"), var.arg = TRUE),
constraints = cm.ARMA(Model = ~ 1 + x2, lags.cm = c(3, 3), Resp = 2),
trace = TRUE, data = tsdata, crit = "log")
coef(fit.Ex3, matrix = TRUE)
summary(fit.Ex3)
vcov(fit.Ex3)
constraints(fit.Ex3)
VGLTSM family function for the
Two–dimensional Error–Correction Model
(Engle and Granger, 1987) for I(1)–variables
Description
Estimates a bidimensional error-correction model of
order–(K, L), as proposed by Engle–Granger
(Two step–approach; 1987), with bivariate normal errors
by maximum likelihood estimation using Fisher scoring.
Usage
ECM.EngleGran(ecm.order = c(1, 1),
zero = c("var", "cov"),
resids.pattern = c("intercept", "trend",
"neither", "both")[1],
lag.res = 1,
lmean = "identitylink",
lvar = "loglink",
lcov = "identitylink",
ordtsDyn = 0)
Arguments
ecm.order |
Length–2 (positive) integer vector. The order of the ECM model. |
zero |
Integer or character–string vector.
Details at |
resids.pattern |
Character. How the static linear regression
|
lag.res |
Numeric, single positive integer. The error
term for the long–run equilibrium path is
lagged up to order |
lmean, lvar, lcov |
Same as |
ordtsDyn |
Positive integer. Allows to compare the estimated coefficients with those provided by the package 'tsDyn'. See below for further details. |
Details
This is an implementation of the two–step approach as
proposed by Engle–Granger [1987] to estimate an
order–(K, L) bidimensional error correction
model (ECM) with bivariate normal errors.
This ECM class models the dynamic behaviour of two
cointegrated I(1)-variables,
say y_{1, t} and y_{2, t} with,
probably, y_{2, t} a function of
y_{1, t}.
Note, the response must be a two–column matrix,
where the first entry is the regressor,
i.e, y_{1, t} above, and the regressand in the
second colum. See Example 2 below.
The general specification of the ECM class described by this family function is
\Delta y_{1, t} |\Phi_{t - 1} =
~\phi_{0, 1} + \gamma_1 \widehat{z}_{t - k} +
\sum_{i = 1}^K \phi_{1, i} \Delta y_{2, t - i} +
\sum_{j = 1}^L \phi_{2, j} \Delta y_{1, t - j} +
\varepsilon_{1, t},
\Delta y_{2, t} |\Phi_{t - 1}=
~\psi_{0, 1} + \gamma_2 \widehat{z}_{t - k} +
\sum_{i = 1}^K \psi_{1, i} \Delta y_{1, t - i} +
\sum_{j = 1}^L \psi_{2, j} \Delta y_{2, t - j} +
\varepsilon_{2, t}.
Under the binormality assumption on the errors
(\varepsilon_{1, t}, \varepsilon_{2, t})^T with covariance matrix \boldsymbol{\textrm{V}},
model above can be seen as a
VGLM fitting linear models over the conditional means,
\mu_{\Delta y_{1, t} } = E(\Delta y_{1, t} | \Phi_{t - 1}) and
\mu_{\Delta y_{2, t} } = E(\Delta y_{2, t} | \Phi_{t - 1} ),
producing
(\Delta y_{1, t} |\Phi_{t - 1} ,
\Delta y_{2, t} |\Phi_{t - 1} )^T \sim N_{2}(\mu_{\Delta y_{1, t}} ,
\mu_{\Delta y_{2, t}}, \boldsymbol{\textrm{V}})
The covariance matrix is assumed
to have elements
\sigma_1^2, \sigma_2^2, and
\textrm{Cov}_{12}.
Hence, the parameter vector is
\boldsymbol{\theta} =
(\phi_{0, 1}, \gamma_1, \phi_{1, i}, \phi_{2, j},
\psi_{0, 1}, \gamma_2, \psi_{1, i}, \psi_{2, j},
\sigma_1^2, \sigma_2^2, \textrm{Cov}_{12})^T,
for i = 1, \ldots, K and j = 1, \ldots, L.
The linear predictor is
\boldsymbol{\eta} = (\mu_{\Delta y_{1, t} },
\mu_{\Delta y_{2, t} },
{\color{blue}\texttt{loglink}}~\sigma_1^2,
{\color{blue}\texttt{loglink}}~\sigma_2^2,
\textrm{Cov}_{12})^T.
The estimated cointegrated vector,
\boldsymbol{\widehat{\beta^{\star}}} = (1, -\boldsymbol{\widehat{\beta})}^T is obtained by linear regression
depending upon resids.pattern,
as follows:
1) y_{2, t} = \beta_0 + \beta_1 y_{1, t} + z_t,
if resids.pattern = "intercept",
2) y_{2, t} = \beta_1 y_{1, t} + \beta_2 t + z_t,
if resids.pattern = "trend",
3) y_{2, t} = \beta_1 y_{1, t} + z_t,
if resids.pattern = "neither", or else,
4) y_{2, t} = \beta_0 + \beta_1 y_{1, t} + \beta_2 t + z_t,
if resids.pattern = "both",
where \boldsymbol{\widehat{\beta}} =
(\widehat{\beta_0}, \widehat{\beta_1}, \widehat{\beta_2})^T, and z_t assigns the error term.
Note, the estimated residuals,
\widehat{z_t} are (internally) computed
from any of the linear models 1) – 4) selected, and then lagged
up to order alg.res,
and embedded as explanatories in models
\Delta y_{1, t} |\Phi_{t - 1} and
\Delta y_{3, t} |\Phi_{t - 1} above.
By default, \widehat{z}_{t - 1}
are considered (as lag.res = 1), although it may be any lag
\widehat{z}_{t - k}, for k > 0.
Change this through argument lag.res.
Value
An object of class "vglmff"
(see vglmff-class) to be
used by VGLM/VGAM modelling functions, e.g.,
vglm or vgam.
Note
Reduced–Rank VGLMs (RR-VGLMs) can be utilized to aid the increasing
number of parameters as K and L grows.
See rrvglm.
By default, \sigma_1^2, \sigma_2^2 and \textrm{Cov}_{12}
are intercept–only. Set argument zero accordingly to change this.
Package tsDyn also has routines to fit ECMs.
However, the bivariate–ECM handled (similar to that one
above) differs in their parametrization:
tsDyn considers the current estimated residual,
\widehat{z}_t instead of
\widehat{z}_{t - 1} in models
\Delta y_{1, t} |\Phi_{t - 1} and
\Delta y_{2, t} |\Phi_{t - 1}.
See Example 3 below which compares ECMs fitted with VGAMextra and tsDyn.
Author(s)
Victor Miranda
References
Engle, R.F. and Granger C.W.J. (1987) Co-integration and error correction: Representation, estimation and testing. Econometrica, 55(2), 251–276.
Pfaff, B. (2011)
Analysis of Integrated and Cointegrated Time Series with R.
Seattle, Washington, USA: Springer.
See Also
MVNcov,
rrvglm,
CommonVGAMffArguments,
Links,
vglm.
Examples
## Example 1. Comparing the Engle -- Granger procedure carried oud by two procedures.
## ECM.EngleGran() makes easier the fitting process.
## Here, we will use:
## A) The R code 4.2, in Chapter 4, Pfaff (2011).
## This code 1) generates artificial data and 2) fits an ECM, following
## the Engle --Granger procedure.
## B) The ECM.EngleGran() family function to fit the same model assuming
## bivariate normal innovations.
## The downside in the R code 4.2 is the assumption of no--correlation among
## the errors. These are generated indenpendently.
## A)
## STEP 1. Set up the data (R code as in Pfaff (2011)).
nn <- 100
set.seed(123456)
e1 <- rnorm(nn) # Independent of e2
e2 <- rnorm(nn)
y1 <- cumsum(e1)
y2 <- 0.6 * y1 + e2
lr.reg <- lm(y2 ~ y1)
error <- residuals(lr.reg)
error.lagged <- error[-c(nn - 1, nn)]
dy1 <- diff(y1)
dy2 <- diff(y2)
diff.dat <- data.frame(embed(cbind(dy1, dy2), 2))
colnames(diff.dat) <- c('dy1', 'dy2', 'dy1.1', 'dy2.1')
## STEP 2. Fit the ECM model, using lm(), R code as in Pfaff (2011).
ecm.reg <- lm(dy2 ~ error.lagged + dy1.1 + dy2.1, data = diff.dat)
summary(ecm.reg)
## B) Now, using ECM.EngleGran() and VGLMs, the steps at A) can be skipped.
## Enter the I(1)--variables in the response vector only, putting down the
## the dependent variable from the I(1) set, i.e. y2, in the second column.
coint.data <- data.frame(y1 = y1, y2 = y2)
fit.ECM <- vglm(cbind(y1, y2) ~ 1, ECM.EngleGran, data = coint.data, trace = TRUE)
## Check coefficients ##
coef(fit.ECM, matrix = TRUE) ## Compare 'Diff2' with summary(ecm.reg)
coef(summary(ecm.reg))
head(depvar(fit.ECM)) # The estimated differences (first order)
vcov(fit.ECM)
constraints(fit.ECM, matrix = TRUE)
## Not run:
### Example 2. Here, we compare ECM.EngleGran() from VGAMextra with VECM() from
## package "tsDyn" when fitting an ECM(1, 1). We will make use of
## the argument 'ordtsDyn' so that the outcomes can be compared.
library("tsDyn") # Need to be installed first.
fit.tsDyn1 <- with(coint.data, VECM(cbind(y2, y1), lag = 1, estim = "2OLS")) # MODEL 1
summary(fit.tsDyn1)
### Fit same model using ECM.EngleGran(). NOTE: Set ordtsDyn = 1 !! # MODEL 2
fit.ECM.2 <- vglm(cbind(y1, y2) ~ 1, ECM.EngleGran(ecm.order = c(1, 1),
resids.pattern = "neither", ordtsDyn = 1),
data = coint.data, trace = TRUE)
coef.ECM.2 <- coef(fit.ECM.2, matrix = TRUE)
fit.tsDyn1$coefficients ## From pakage 'tsDyn'.
t(coef.ECM.2[, 1:2][c(2, 1, 4, 3), ][, 2:1]) ## FROM VGAMextra
### Example 3. An ECM(2, 2), with residuals estimated by OLS, with NO intercept
### and NO trend term. The data set is 'zeroyld', from package tsDyn.
### ECM.EngleGran() and with VECM() willbe compared again.
data(zeroyld, package = "tsDyn")
# Fit a VECM with Engle-Granger 2OLS estimator:
vecm.eg <- VECM(zeroyld, lag=2, estim = "2OLS")
summary(vecm.eg)
# For the same data, fit a VECM with ECM.EngleGran(), from VGAMextra.
# Set ordtsDyn = 1 for compatibility!
fit.ECM.3 <- vglm(cbind(long.run, short.run) ~ 1, ECM.EngleGran(ecm.order = c(2, 2),
resids.pattern = "neither", ordtsDyn = 1),
data = zeroyld, trace = TRUE)
coef.ECM.3 <- coef(fit.ECM.3, matrix = TRUE)
#### Compare results
vecm.eg$coefficients # From tsDyn
t(coef.ECM.3[, 1:2][c(2, 1, 5, 3, 6, 4 ),][, 2:1]) # FROM VGAMextra
## End(Not run)
Air pollution and hospital admissions due to respiratory and cardiovascular causes, Hong Kong.
Description
Daily air pollution levels and hospital admissions due to respiratory and cardiovascular causes, between 1 January 1994 and 31 December 1997, Hong Kong.
Usage
data(HKdata)
Format
This is a subset of the data analyzed in Xia and Tong (2006) which stores the following time series:
- no
-
Time vector,
- cardio, resp
-
Integer. Daily hospital admissions due to respiratory and cardiovascular causes, 1 January 1994 and 31 December 1997.
- no2, so2, rsp, o3
-
Numeric. Daily mean average of
\textrm{NO}_{2},\textrm{SO}_{2}, respirable suspended particles (rsp; that is\textrm{PM}_{10}), and\textrm{O}_{3}, in parts per billion (ppb). - temp, hum
-
Numeric. Daily mean average of temperature (Celsius deg.) and humidity (%)
- mon, tue, wed, thur, fri, sat
-
Factors with two levels. Weekdays/weekends indicator variables.
Source
Data set retrieved from https://blog.nus.edu.sg/homepage/research/
References
Xia, Y. and Tong, H. (2006) Cumulative effects of air pollution on public health. Statistics in Medicine. 25(29), 3548-3559.
Examples
data(HKdata)
summary(HKdata[, -1])
KPSS tests for stationarity
Description
The Kwiatkowski-Phillips-Schmidt-Shin (KPSS)
test for the null hypothesis that the series x is
level or trend stationary
Usage
KPSS.test(x, type.H0 = c("level", "trend")[1],
trunc.l = c("short", "large")[1],
show.output = TRUE)
Arguments
x |
Numeric. A univariate series. |
type.H0 |
The null hypothesis to be tested: either |
trunc.l |
The lag truncation parameter. See below for more details. |
show.output |
Logical. Should the results be displyed? Default is
|
Details
To test the null hypothesis that a univariate time series is level–stationary or stationary around a deterministic trend. The alternative states the existence of a unit root.
Under this methodology, the series, say
\{ y_t;~t = 1, \ldots, T\}
is assumed to be decomposed as
y_t = \rho t + \xi_t + \varepsilon_t,
that is, as the sum of a deterministic trend,
a random walk (\xi_t), and a stationary error
(\varepsilon_t \sim N(0, \sigma^2_z).
Hence, this test reduces to simply test the
hypothesis that \{ \xi_t \}
is stationary,
that is, H_0: \sigma^2_z = 0.
The test statistic combines the one–sided Lagrange
multiplier (LM) statistic
and the locally best invariant (LBI) test
statistic (Nabeya and Tanaka, (1988)).
Its asymptotic distribution is discussed
in Kwiatkowski et al. (1992), and depends on
the ‘long–run’ variance \sigma^2.
The test statistic is given by
\eta = T^{-2} \sum_i S^2_i / \widehat{\sigma}^2=
T^{-2} \sum_i S^2_i / s^2(l).
where s^2(l) is a consistent estimate of
\sigma^2, given by
s^2(l) = (1/T)\sum_{t = 1}^T\varepsilon^2_t +
(2 / T) \sum_{s = 1}^l w(s, l) \sum_{t = s + 1}^T \varepsilon_t
\varepsilon_{t - s}.
Here, w(s, l) = 1 - s/(l + 1), where
l is taken from trunc.l, the
lag–truncation parameter. The choice "short"
gives the smallest integer not less than
3 \sqrt{T} / 11, or else,
9 \sqrt{T} / 11, if
trunc.l = "large".
Note, here the errors, \varepsilon_t, are
estimated from the regression
x ~ 1 (level) or x ~ 1 + t (trend), depending upon
the argument type.H0.
Unlike other software using linear interpolates, here
the p–values for both, trend and level stationarity,
are interpolated by cubic spline interpolations
from the tail critical values given
in Table 1 in Kwiatkowski et al. (1992).
The interpolation takes place on \eta.
Value
A list with the following:
1) Test statistic and P-value,
2) Critical values,
3) Residuals, \varepsilon_t.
Note
There is no standard methodology to select an
appropriate value for trunc.l, however,
satisfactory results have been found for
trunc.l proportional to T^{1/2}.
See Andrews, D.W.K. (1991) for a discussion on this.
Empirically, this parameter may be suggested
by the problem in turn, and should be large enough to
approximate the true dynamic behaviour of the series.
Author(s)
Victor Miranda.
References
Andrews, D.W.K. (1991) Heteroskedasticiy and autocorrelation consistent covariance matrix estimation. Econometrica, 59, 817–858.
Kwiatkowski, D., Phillips, P.C.B., Schmidt, P., and Shin, Y. (1992) Testing the null hypothesis of stationarity against the alternative of a unit root. Journal of Econometrics, 54, 159–178.
Nabeya, S. and Tanaka, K. (1988) Asymptotic theory of a test for the constancy regression coefficients against the random walk alternative. Annals of Statistics, 16, 218–235.
Phillips, P.C.B. and Perron, P. (1988) Testing for a unit root in time series regression. Biometrika, 75, 335–346.
Phillips, P.C.B. (1987) Time series with unit roots. Econometrica, 55, 277–301
See Also
Examples
set.seed(2802)
test <- KPSS.test(rnorm(20), type.H0 = "trend")
class(test)
test$crit.value
VGLTSMs Family Functions: Order–q Moving Average Model with covariates
Description
Estimates the intercept, standard deviation (or variance) of the random noise (not necessarily constant), and the conditional–mean model coefficients of an order–q moving average (MA) process with covariates (MAX(q)) by maximum likelihood estimation using Fisher scoring.
Usage
MAXff(order = 1,
zero = c(if (nomean) NULL else "Mean", "MAcoeff"),
xLag = 0,
type.EIM = c("exact", "approximate")[1],
var.arg = TRUE,
nomean = FALSE,
noChecks = FALSE,
lmean = "identitylink",
lsd = "loglink",
lvar = "loglink",
lMAcoeff = "identitylink",
imean = NULL,
isd = NULL,
ivar = NULL,
iMAcoeff = NULL)
Arguments
order |
The order 'q' of the MA model, recycled if needed.
By default |
zero |
Integer or character–string vector.
Same as |
xLag |
Same as |
type.EIM, var.arg |
Same as |
nomean |
Logical. |
noChecks |
Logical. Same as |
lmean, lsd, lvar, lMAcoeff |
Link functions applied to the mean (intercept),
the standard deviation or variance of the random noise, and
the MA coefficients (conditional–mean model).
Note, |
imean, isd, ivar, iMAcoeff |
Same as |
Details
Similar to ARXff,
this family function fits an order–q moving average model
with covariates (MAX(q)),
another special case of the class VGLM–ARIMA
(Miranda and Yee, 2018). Observations, Y_t, are
modelled as
Y_t | \Phi_{t - 1} = \mu_t + \phi_{1} \varepsilon_{t - 1} +
\ldots + \phi_q \varepsilon_{t - q} + \varepsilon_t,
where \mu_t is the (possibly time–dependent) intercept,
modelled as
\mu_t = \mu + \boldsymbol{\beta}^T \boldsymbol{x}_t, and
the errors are mean–zero Gaussian:
\varepsilon_t | \Phi_{t - 1} \sim
N(0, \sigma^2_{\varepsilon_t | \Phi_{t - 1}}).
The symbol
\Phi_{t} denotes the history
of the joint process
\left(Y_{t}, \boldsymbol{X}_{t + 1}^T \right),
at time t for a time–varying covariate
vector \boldsymbol{x}_t.
At each step of Fisher scoring, the exact log-likelihood
based on model above
is computed through dMAq.
The linear predictor by default is
\boldsymbol{\eta} = \left(
\mu_t, \log \sigma^{2}_{\varepsilon_{t | \Phi_{t - 1}}},
\phi_1, \ldots, \phi_q
\right)^T.
The unconditional mean of the process is simply
E(Y_{t}) = \mu, provided no covariates added.
This family function
is not restricted to the noise to be strictly white noise
(in the sense of constant variance).
That is, covariates may be incorporated in the linear predictor
\log \sigma^{2}_{\varepsilon_{t | \Phi_{t - 1}}}. Also, it handles
multiple responses so that a matrix can be used as the response.
For further details on VGLM/VGAM–link functions,
such as logitlink, refer to
Links.
Value
An object of class "vglmff"
(see vglmff-class) to be
used by VGLM/VGAM modelling functions, e.g.,
vglm or vgam.
Warning
By default, a moving-average model of order-1 is fitted.
If different, the order is recycled up to the number of responses
entered in the vglm \ vgam call has been matched.
Successful convergence depends on reasonably setting initial values. If
initial values computed by the algorithm are not adequate, make use of
the the optional initial values (imean, isd, etc.)
For constraints on the paramaters see
cm.ARMA.
Note
Further choices for the random noise, besides Gaussian, will be implemented over time.
zero can be either an integer vector
or a vector of character strings
specifying either the position(s) or name(s) (partially or not) of the
parameter(s) modeled as intercept-only. For MAXff,
the parameters are placed and named as follows (by convention):
c("Mean", "noiseVar" or "noiseSD", "MAcoeff").
Users can modify the zero argument accordingly.
For simplicity, the second choice recommended.
See CommonVGAMffArguments
for further details on zero.
If no constraints are entered in the fitting process,
(e.g., via cm.ARMA)
this family function internally verifies by default
whether the estimated series
is invertible (since noChecks = FALSE).
To ignore this step, set noChecks = TRUE.
If the estimated MA process is non-invertible, MLE
coefficients will conform with the corresponding
invertible MA model.
Further details about these checks are shown within
the summary() output.
Author(s)
Victor Miranda and Thomas W. Yee.
References
Miranda, V. and Yee, T.W. (2018) Vector Generalized Linear Time Series Models. In preparation.
Madsen, H. (2008) Time Series Analysis Florida, USA: Chapman & Hall. (Sections 5.3 to 5.5).
Tsay, R. (2013)
An Introduction to Analysis of Financial data with R.
New Jersey, USA: Wiley Sections 2.2 to 2.4.
See Also
ARIMAXff,
ARXff,
checkTS.VGAMextra,
CommonVGAMffArguments,
Links,
vglm,
vgam.
Examples
set.seed(2)
nn <- 130
### Coefficients
phi1 <- 0.34; phi2 <- -1.19; phi3 <- 0.26
### Intercept
mu <- c(-1.4, 2.3)
### Noise standar deviations (Two responses)
sdMA <- c(sqrt(6.5), sqrt(4.0))
### A single covariate.
Xcov <- runif(nn)
# Generating two MA processes, TS1 and TS2, Gaussian noise.
# Note, the SD noise for TS2 is function of Xcov.
y1 <- mu[1] + arima.sim(nn,
model = list( ma = c(phi1, phi1^2)),
rand.gen = rnorm, sd = exp(sdMA[1]) )
y2 <- mu[2] + arima.sim(nn,
model = list( ma = c(phi1, phi2, phi3) ),
rand.gen = rnorm, sd = exp(Xcov + sdMA[2]) )
# OUR DATA
tsdata <- data.frame(x2 = Xcov , TS1 = y1, TS2 = y2)
#------------------------------------------------------------------------#
# 1. A simple MA(3) to compare with 'arima()'.
myfit0 <- vglm(TS1 ~ 1,
MAXff(order = 3, type.EIM = "exact",
var.arg = FALSE),
#constraints = cm.ARMA(Model = ~ 1,
# lags.cm = 2,
# Resp = 1),
data = tsdata, trace = TRUE)
Coef(myfit0)[c(3, 4, 1)]
fitArima <- arima(tsdata$TS1, order = c(0, 0, 2))
coef(fitArima)
AIC(myfit0); BIC(myfit0)
# ------------------------------------------------------------------------#
# 2. Estimate an MA(3), intercept-only, using initial values.
myfit <- vglm(TS2 ~ 1,
MAXff(order = 3, type.EIM = c("exact", "approx")[1],
# Optional initial values.
imean = 3,
iMAcoeff = c(0.3, -0.2, 0.25),
var.arg = TRUE),
data = tsdata, trace = TRUE)
Coef(myfit)
summary(myfit)
constraints(myfit)
#----------------------------------------#
# Same model fitted using arima()
#----------------------------------------#
fitArima <- arima(tsdata$TS2, order = c(0, 0, 3))
coef(fitArima)
#------------------------------------------------------------------------#
# 3. An MAX(3) with one covariate, testing its effect over the
# standard deviation of the Gaussian noise. Note the 'zero' argument.
myfit1 <- vglm(TS2 ~ x2,
# Or Multiple responses!
# cbind(TS1, TS2) ~ 1,
MAXff(order = 3, type.EIM = "exact", xLag = 1,
# Optional initial values:
# idev.mean = 1.4,
# iMAcoeff = c(2.3, -1.2, 0.25), isd = 1.6,
# NOTE THE ZERO ARGUMENT:
zero = c("Mean", "MAcoeff"),
var.arg = TRUE),
data = tsdata, trace = TRUE)
coef(myfit1, matrix = TRUE)
summary(myfit1)
vcov(myfit1)
constraints(myfit1)
#------------------------------------------------------------------------#
# Model above CANNOT be fitted using arima()
#------------------------------------------------------------------------#
Multivariate Normal Distribution Family Function
Description
Maximum likelihood estimation of the Multivariate Normal distribution. The covariances (not correlation coefficients) are included in the parameter vector.
Usage
MVNcov(zero = c("var", "cov"),
lmean = "identitylink",
lvar = "loglink",
lcov = "identitylink")
Arguments
zero |
Integer or character–string vector.
Which linear predictors are intercept–only.
Details at |
lmean, lvar, lcov |
VGLM–link functions applied to the means, variances and covariances. |
Details
For the K–dimensional normal distribution,
this fits a linear model to the K means
\mu_j j = 1, \ldots, K, which are the
first entries in the linear predictor. The variances
\sigma^2_j j = 1, \ldots, K
and then the covariances
cov_{ij} i = 1, \ldots, K,
j = i + 1, \ldots, K, are next aligned.
The fitted means are returned as the fitted values.
The log–likelihood is computed via
dmultinorm,
an implementation of the multivariate Normal density.
The score and expected information matrices are internally computed at each Fisher scoring step, using its vectorized form.
The response should be an K–column matrix.
The covariances may be any real number so that
the identitylink
is a reasonable choice.
For further details on VGLM/VGAM–link functions, see
Links.
Value
An object of class "vglmff"
(see vglmff-class) to be
used by VGLM/VGAM modelling functions, e.g.,
vglm or vgam.
Note
Unlike other implementations, e.g.,
binormal from
VGAM in terms of \rho and standard deviations,
MVNcov estimates
the variances and covariances, modeled as intercept–only.
See argument zero, whose default is c("var", "cov"),
to change this.
Thus far, there is no guarantee that the estimated var–cov matrix will be
positive–definite. Proper procedures to validate this will
be incorporated shortly, such as the @validparams slot.
Although the function has been tested on K \leq 5
data sets, it is recommended that K \leq 3,
unless the data are nice and n is sufficiently large.
Author(s)
Victor Miranda.
See Also
dmultinorm,
binormal,
CommonVGAMffArguments,
vglm.
Examples
# K = 3.
set.seed(180227)
nn <- 85
mvndata <- data.frame(x2 = runif(nn), x3 = runif(nn))
mvndata <- transform(mvndata,
y = rbinorm(nn, mean1 = 2 - 2 * x2 + 1 * x3,
mean2 = 2 - 1.5 * x3,
var1 = exp(1.0), var2 = exp(-0.75),
cov12 = 0.5 * exp(1.0) * exp(-0.75)))
mvndata <- transform(mvndata, y3 = rnorm(nn, mean = 2 + x2, sd = exp(1.5)))
colnames(mvndata) <- c("x2", "x3", "y1", "y2", "y3")
mvnfit <- vglm(cbind(y1, y2, y3) ~ x2 + x3, MVNcov, data = mvndata, trace = TRUE)
(mvncoef <- coef(mvnfit, mat = TRUE))
## Check variances and covariances: var1, var2 and var3.
exp(mvncoef[c(10, 13, 16)]) # True are var1 = exp(1.0) = 2.718,
# var2 = exp(-0.75) = 0.472
# and var3 = exp(1.5)^2 = 20.08554
vcov(mvnfit)
constraints(mvnfit)
summary(mvnfit)
Conditional quantile regression with VGAM
Description
Use this function to adequately confer the formula
in VGAM when fitting quantile regression models.
Usage
Q.reg(y, pvector = NULL, length.arg = NULL)
Arguments
y |
Numeric, a vector or a matrix. It is the response or dependent
variable in the |
pvector |
A prototype vector. Entries are the conditional |
length.arg |
A length–1 positive integer. It is the number of |
Details
Conditional quantile regression can be carried out using family
functions in VGAM and VGAMextra.
The formula must be set up using this function, Q.reg.
Here, the p–quantiles of interest may be entered via
pvector. Alternatively, use argument length.arg by
establishing the length of pvector.
Besides, the corresponding link must be entered.
For example, gamma1Qlink
is the proper link to fit models of conditional quantiles for
data distributed as Gamma via the family function
gamma1.
See examples for further details.
Value
A matrix, each column adequately arranged for regression on conditional quantiles, conforming with VGAM.
Indeed, this is equivalent to cbind(y, y, ...), where the
total number of columns is, either the length of pvector, or
length.arg.
Note
Link functions for quantile regression in VGAM require the
vector of p–quantiles of interest via the argument p.
See normal1sdQlink or
maxwellQlink for instance.
Therefore, the integer entered at length.arg in this function,
if utilized, must match the length of the vector p. Else,
it will be recycled.
Author(s)
V. Miranda and T. W. Yee.
See Also
normal1sdQlink,
maxwellQlink,
gamma1Qlink,
gamma1,
vglm,
vgam
Examples
### Quantile regression with data distributed as Maxwell(s) ###
set.seed(12073)
x2 <- seq(0, 100,length.out = 100) # independent variable
b0 <- 0.5 # true intercept
b1 <- 0.25 # true slope
b2 <- 0.02 # true second order coef.
alpha <- b0 + b1 * x2 + b2 * x2^2 # Quadratically modelling the parameters
nn <- 100 # Sample size
# The data as a data frame. #
mdata <- data.frame(y = rmaxwell(n = nn, rate = alpha), x2 = x2, x3 = x2^2)
# Quantile regression using our link function maxwellQlink(). #
# Quantiles 25%, 50%, 75% are of interest #
my.p <- c(0.25, 0.50, 0.75)
fit <- vglm(Q.reg(y, pvector = my.p) ~ x2 + x3,
# OPTIONALLY Q.reg(y, length = length(my.p)) ~ x2 + x3
maxwell(link = "maxwellQlink(p = my.p)"),
data = mdata, trace = TRUE, crit = "coef")
coef(fit, matrix = TRUE)
summary(fit)
head(predict(fit))
constraints(fit)
Utility Functions for the VGAMextra Package
Description
A set of common utility functions required by time series family functions at 'VGAMextra'.
Usage
Is.Numeric(x, isInteger = FALSE, length.arg = NULL, Nnegative = NULL)
is.FormulaAR(Model = ~ 1, Resp = 1)
cross.gammas(x, y = NULL, lags = 1)
WN.lags(y, lags, to.complete = NULL)
extract.Residuals(object, TSprocess,...)
fittedVGAMextra(object,...)
weightsVGAMextra(object, type.w = "prior",...)
XLMmat(object,...)
Arguments
x |
A vector of quantiles. Particularly, for |
y |
Vector of quantiles to be lagged. Then, the cross - covariances
are computed from |
isInteger |
Logical. If |
lags |
Integer indicating the number of lags or delays to be applied to
vector |
length.arg |
Integer. If |
Model |
Formula. A symbolic form of the models fitted by the |
Nnegative |
Logical. If |
Resp |
Integer. The number of responses in the |
object |
An object of class |
TSprocess |
Logical, what time series model is being fitted.
Choices are
|
type.w |
Character. What type of |
to.complete |
Use this argument to fill in the first 'p' observations when computing the lagged vectors in time series. |
... |
Additional parameters required by function
|
Details
A set of utility functions in VGAMextra for different purposes.
Specially for time series family functions in VGAMextra which involve specific checks on the majority of arguments entered by the user.
Value
Is.Numeric() returns a logical vector (or value)
(TRUE or FALSE), after verifying whether quantiles
x satisfies all conditions entered.
For is.FormulaAR(), this function returns a logical value,
after verifying whether the expression entered for the Model
argument in cm.ARMA is an
object of class 'formula'.
Particularly, cross.gammas() computes either the single lagged
covariance(s) from quantiles given in x or the
lagged cross-covariance(s) from values given in x and y.
extract.Residuals() extracts the residuals of the process from
slot @residuals, whilst
fittedVGAMextra and weightsVGAMextra return the
fitted values and the weights from the vglm object,
correspondingly.
isNA and inspectVGAMextra are essentially required when
implementing link functions in VGAMextra.
Author(s)
V. Miranda and T. W. Yee.
See Also
Examples
# Example 1.
myModel1 <- ~ x1 + x2
is.FormulaAR(myModel1) # TRUE
test <- list( cbind(y1, y2) ~ x1, ~ x2 - 1)
is.FormulaAR(test) # FALSE
is.FormulaAR(test[[1]], 2) # TRUE
# Example 2.
x1 <- c(1:3, 4.5, -Inf)
Is.Numeric(x1) # TRUE
Is.Numeric(x1, length.arg = 5) # TRUE
Is.Numeric(x1, length.arg = 5, isInteger = TRUE) # FALSE
Is.Numeric(x1, length.arg = 5, Nnegative = TRUE) # FALSE
# Example 3.
# Here, 'cross.gammas' computes Cov(x, y_{t - 1}), Cov(x, y_{t - 2}) and
# Cov(x, y_{t - 3}).
x <- runif(50)
y <- runif(50)
cross.gammas(x, y, lags = 3)
VGLTSM family function for the
Order–p Vector Auto(R)egressive Model
Description
Estimates an Order(p) Vector Autoregressive Models (VAR(p)) with
white noise random errors
by maximum likelihood estimation using Fisher scoring.
Usage
VARff(VAR.order = 1,
zero = c("var", "cov"),
lmean = "identitylink",
lvar = "loglink",
lcov = "identitylink")
Arguments
VAR.order |
Length–1 (positive) integer vector. The order of the VAR to be fitted. |
zero |
Integer or character - string vector.
Same as |
lmean, lvar, lcov |
Same as |
Details
Let \boldsymbol{x}_t = (x_{1, t}, \ldots, x_{K, t})^T be a time dependent
vector of responses, with index t = 1, \ldots, T,
and \boldsymbol{\varepsilon}_t = (\varepsilon_{1, t},
\ldots, \varepsilon_{K, t})
white noise with covariance matrix
\boldsymbol{\textrm{V}}.
VARff fits a linear model to the means of a
K–variate normal distribution, where
each variable, x_{i, t}, i = 1, \ldots, K,
is a linear function of p–past
lags of itself and past p–lags of the other variables.
The model has the form
\boldsymbol{x}_t = \boldsymbol{\Phi_1} \boldsymbol{x}_{t - 1} +
\cdots + \boldsymbol{\Phi_p} \boldsymbol{x}_{t - p} +
\boldsymbol{\varepsilon}_t,
where \boldsymbol{\Phi_j} are
K \times K matrices of coefficients, j = 1, \ldots, K,
to be estimated.
The elements of the covariance matrix are intercept–only by default.
Value
An object of class "vglmff"
(see vglmff-class) to be
used by VGLM/VGAM modelling functions, e.g.,
vglm or vgam.
Author(s)
Victor Miranda.
See Also
MVNcov,
zero,
Links,
ECM.EngleGran,
vglm.
Examples
set.seed(20170227)
nn <- 60
var.data <- data.frame(x2 = runif(nn, -2.5, 2.5))
var.data <- transform(var.data, y1 = rnorm(nn, 1.5 - 2 * x2, sqrt(exp(1.5))),
y2 = rnorm(nn, 1.0 - 1 * x2, sqrt(exp(0.75))),
y3 = rnorm(nn, 0.5 + 1 * x2, sqrt(exp(1.0))))
fit.var <- vglm(cbind(y1, y2, y3) ~ x2, VARff(VAR.order = 2),
trace = TRUE, data = var.data)
coef(fit.var, matrix = TRUE)
summary(fit.var)
vcov(fit.var)
Estimated White Noise (WN) from the autoregressive moving-average model
of order-(p, q) [ARMA(p, q)].
Description
Estimates the unobserved white noise of the ARMA(p, q)
model via the corresponding inverted process.
Also, provides the initial values of ARXff,
MAXff, and
ARMAXff family functions.
Usage
WN.InitARMA(tsData = NULL,
order = c(1, 0, 1),
whiteN = FALSE,
moreOrder = 0,
updateWN = FALSE)
Arguments
tsData |
A univariate data frame containing the time series to be fitted according
to an ARMA( |
order |
A vector with three integer components. It is order of the ARMA model to be
inverted. These entries, |
whiteN |
Logical. If |
moreOrder |
A non-negative integer (might be zero) used to increment the order of
the AR model initially fitted to estimate the residuals, i.e.,
an AR( |
updateWN |
Logical. if |
Details
Overall, the autoregressive moving average process of order c(p, q),
shortly denoted as ARMA(p, q), with intercept
\mu can be expressed as
y_{t} = \mu + \theta_{1} y_{t - 1} + \ldots + \theta_{p} y_{t - p} +
\phi_1 \varepsilon_{t - 1} + \ldots +
\phi_q \varepsilon_{t - q} + \varepsilon_{t}.
It is well known that it can be expressed in terms of an autoregressive
process of infinite order, AR(\infty), by
recursive substitutions. For instance, given a mean-zero ARMA(1, 1),
y_{t} = \theta_1 y_{t - 1} + \phi_1 \varepsilon_{t - 1} +
\varepsilon_{t},
\quad \quad (1)
one may express
\varepsilon_{t - 1} = Y_{t - 1} - ( \theta_{1} y_{t - 2} +
\phi_{1} \varepsilon_{t - 2}
Substituting this equation in (1) yields the initial inverted process, as follows:
y_{t} = \psi_{1} y_{t - 1} + \psi_{2} y_{t - 2} +
f(\varepsilon_{t - 2}, \varepsilon_{t} ).
where f is a function of \varepsilon_{t - 2} and
\varepsilon_{t}.
Repeated substitutions as above produces the so-called inverted process,
y_{t} = \sum_{k = 1}^{\infty} \psi_{k} y_{t - k} +
\varepsilon_{t}. \quad \quad (2)
k = 1, \ldots, \infty.
Hence, setting an acceptable order (via the moreOrder
argument, 1 or 2 for instance), an
AR(p + moreOrd)
inverted model is internally fitted
within WN.InitARMA. Consequently, the unobserved white noise,
\{ \varepsilon_{t} \} , is estimated by computing
the residuals in (2), after regression.
whiteN = TRUE enables this option.
Finally, initial values of the MAXff, and
ARMAXff family functions can be computed by least squares from
the estimated white noise above, \{ \varepsilon_{t} \}
and the given data, \{ t_{t} \} .
Initial values of ARXff
are also internally computed using \{ t_{t} \} only.
Value
A list with the following components:
Coeff |
The initial values of the VGLM/VGAM family function in turn:
|
whiteN |
(Optional) Estimated white noise enabled only for
|
Warning
For some time series family functions,
MAXff for instance, values of
moreOrder > 3 do NOT improve
the accuracy of estimates, and may lead the algorithm to failure to
converge.
Author(s)
Victor Miranda and T. W. Yee.
References
Brockwell, P. and Davis, R. (2002) Introduction to Time Series and Forecasting. Springer, New York, USA.
Durbin, J. (1959) Efficient Estimation of Parameters in Moving-Average Models. Biometrika, 46, pp 306–316.
See Also
Examples
# Generating some data -> an MA(3)
set.seed(1004)
mydata <- arima.sim( n = 200, list(ma = c(0.3, 0.56 , 0.11)) )
# Computing initial values to be passed to MAXff()
WN.InitARMA(tsData = data.frame(y = mydata),
order = c(0, 0, 3),
moreOrder = 1)
# Returning initial values and white noise.
initMA <- WN.InitARMA(tsData = data.frame(y = mydata),
order = c(0, 0, 3),
moreOrder = 1,
whiteN = TRUE)
# Initial values passed to MAXff()
initMA$Coeff
# Estimated white noise
head(initMA$WhiteNoise)
Air pollution Data, Mexico City.
Description
Daily air pollution levels in Mexico City, January 2004 – June 2005.
Usage
data(ap.mx)
Format
This data frame stores time series vectors with the following information:
- time
-
Time vector.
- PM10
-
24–hr average concentration of
PM_{10}, in micrograms per milliliter. - O3
-
Daily maximum 8–hour moving average of ozone, in micrograms per milliliter.
- temp
-
Daily mean average of temperature, in celsius degrees.
- HR
-
Daily mean average (%) of relative humidity.
Details
These are readings of PM_{10},
O_{3}, temperature and humidity between
1 January 2004 and 30 June 2005 in
Mexico City Metropolitan Area.
Each observation is the 24–hr mean average (between
00:00 and 23:59 hrs), except for
ozone, where the
maximum over all the sliding 8–hour–windows,
between 00:00 and 23:59 hrs is reported, viz.
the daily maximum 8–hour moving average.
Source
National Institute of Ecology. Gathers and disseminates the data generated by the central air quality monitoring network in Mexico City. Website: https://www.gob.mx/inecc/
Examples
data(ap.mx)
summary(ap.mx[, -1])
class(ap.mx[, "PM10"])
layout(matrix(c(1, 1, 2,3), 2, 2, byrow = TRUE))
plot.ts(ts(ap.mx$PM10), ylab = expression(PM[10]~"Series"),
col = "brown", xaxt = "n", las = 1)
xtick <- c(1, 92, 183, 275, 367, 457, 518)
xtext <- c("Jan/04", "April/04", "July/04", "Oct/04", "Jan/05",
"April/05", "June/05")
axis(side = 1, at = xtick, labels = FALSE)
text(x = xtick, par("usr")[3], labels = xtext,
pos = 1, xpd = TRUE, col = "black")
pacf(ap.mx$PM10, main = "", ylim= c(-0.5, 1), lag.max = 60, las = 1)
acf(ap.mx$PM10, main = "", ylim= c(-0.5, 1), lag.max = 60, las = 1)
Link functions for the quantiles of several 1–parameter continuous distributions
Description
Computes the benini1Qlink transformation, its inverse and the
first two derivatives.
Usage
benini1Qlink(theta, p = stop("Argument 'p' must be entered."),
y0 = stop("Argument 'y0' must be specified."),
bvalue = NULL, inverse = FALSE,
deriv = 0, short = TRUE, tag = FALSE)
Arguments
theta |
Numeric or character. See below for further details. |
p |
Numeric. A single value between 0.0 and 1.0.
It is the |
y0 |
Same as |
bvalue, inverse, deriv, short, tag |
See |
Details
This is a link function to model any p–quantile of the
1–parameter Benini distribution. It is called the
benini1Qlink transformation defined as
\log y_0 + \sqrt{\frac{ -\log (1 - p) }{s}}
where y_0 > 0 is a scale parameter and s is
a positive shape parameter, as in benini1.
Numerical values of s or p out of range may
result in Inf, -Inf, NA or NaN.
In particular, arguments inverse and deriv are
disregarded if theta is character.
Value
For deriv = 0, the benini1Qlink transformation of
theta, when inverse = FALSE. If inverse = TRUE, then
the inverse transformation given by
-log(1 - p) / (theta - log y0)^2 is returned.
For deriv = 1, this function returns the derivative
d eta / d theta,
if inverse = FALSE. Else, the reciprocal
d theta / d eta as a function of theta.
If deriv = 2, then the second order derivatives in terms of
theta are accordingly returned.
Warning
The horizontal straight line \log y0 is a lower asymptote
for this link function as \theta increases to \infty.
Thus, when inverse = TRUE and deriv = 0 entries at
theta becoming \eta must be greater than
\log y0. Else, NaN will be returned.
See examples 2 and 3 below.
Note
Numerical instability may occur for values theta too close
to zero or lower than \log y0.
Use argument bvalue to replace them before computing the link.
Author(s)
V. Miranda and Thomas W. Yee.
See Also
Examples
## E1. benini1Qlink() and its inverse ##
p <- 0.50; y0 = 1.25 ## Modeling the median
my.s <- seq(0, 5, by = 0.1)[-1]
max(my.s - benini1Qlink(benini1Qlink(my.s, p = p, y0 = y0),
p = p, y0 = y0, inverse =TRUE)) ## Zero
## E2. Plot of the benini1Qlink() transformation and its inverse ##
## Note, inverse = TRUE implies that argument 'theta' becomes 'eta'. ##
## which must be greater than log(y0). Else, value less than log(y0) ##
## are replaced by NaN. ##
#--- THE LINK
my.b <- seq(0, 5, by = 0.01)[-1]
plot(benini1Qlink(theta = my.b, p = p, y0 = y0) ~ my.b,
type = "l", col = "blue", lty = "dotted", lwd = 3,
xlim = c(-0.1, 6), ylim = c(-0.1, 5), las = 1,
main = c("Blue is benini1Qlink(), green is the inverse"),
ylab = "eta = benini1Qlink", xlab = "theta")
abline(h = 0, v = 0, lwd = 2)
#--- THE INVERSE
lines(my.b, benini1Qlink(theta = my.b, p = p, y0 = y0, inv = TRUE),
col = "green", lwd = 2, lty = "dashed")
#--- Tracing the identity function for double--check
lines(my.b, my.b)
## E3. WARNING! The first two values are less than log(y0) ##
benini1Qlink(theta = c(0.10, 0.15, 0.25, 0.35) , p = p, y0 = y0, inverse = TRUE)
Link functions for the mean of 1–parameter discrete distributions: The Borel–Tanner distribution.
Description
Computes the borel.tannerMlink transformation,
its inverse and the first two derivatives.
Usage
borel.tannerMlink(theta, Qsize = 1,
bvalue = NULL, inverse = FALSE,
deriv = 0, short = TRUE, tag = FALSE)
Arguments
theta |
Numeric or character. See below for further details. |
Qsize |
A positive integer. It is called |
bvalue, inverse, deriv, short, tag |
Details at |
Details
As with
zetaffMlink or
yulesimonMlink,
this link function is part of a set of link functions in
VGAM developed under a common methodology: by taking the
logarithm of the mean of the corresponding distribution.
In particular, this link function emerges by computing the logarithm of the mean of the Borel–Tanner distribution. It is defined as
\tt{borel.tannerMlink}(a) = - \log (Q^{-1} - aQ^{-1}),
where a, 0 < a < 1, is a scale parameter
as in borel.tanner.
The domain set of borel.tannerMlink is the open interval
(0, 1), except when inverse = TRUE and deriv = 0.
See below for further details about this.
Moreover, unlike zetaffMlink
or posPoiMlink, the
inverse of borel.tannerMlink can be written in
closed–form.
Values of a (i.e. theta) out of range will result in
NaN of NA.
If theta is a character, arguments inverse and
deriv are discarded.
Value
For deriv = 0, the borel.tannerMlink transformation of
theta, if inverse = FALSE. When inverse = TRUE,
theta becomes \eta and the inverse of
borel.tannerMlink, given by
1 - \frac{Q}{ e^{\eta} },
is returned. Here, the domain set changes to (0, \infty).
For deriv = 1, d eta / d theta
as a function of theta if inverse = FALSE, else
the reciprocal d theta / d eta.
Similarly, when deriv = 2 the second order derivatives
in terms of theta are returned.
References
Haight, F. and Brueuer, M. A. (1960) The Borel–Tanner distribution. Biometrika, 47, 143–150.
Note
The vertical line a = 1 is an asymptote for this link
function, which sharply grows for values of a too close to 1.0
from the left. For such cases, Inf might result when computing
borel.tannerMlink.
This link function is useful to model any parameter
in (0, 1). Then, some problems may occur if there
are covariates causing out of range values.
Author(s)
V. Miranda and T. W. Yee
See Also
borel.tanner,
yulesimonMlink,
zetaffMlink,
posPoiMlink,
Links.
Examples
## Example 1. Special values for theta (or eta, accordingly) ##
a.par <- c(0, 1:10/10, 20, 1e1, Inf, -Inf, NaN, NA)
# The borel.tannerMlink transformation and the first two derivatives.
print(rbind(a.par,
deriv1 = borel.tannerMlink(theta = a.par, inverse = FALSE, deriv = 1),
deriv2 = borel.tannerMlink(theta = a.par, inverse = FALSE, deriv = 2)),
digits = 2)
# The inverse of 'borel.tannerMlink()' and the first two derivatives.
# 'theta' turns into 'eta'.
print(rbind(a.par,
Invderiv1 = borel.tannerMlink(theta = a.par, inverse = TRUE, deriv = 1),
Invderiv2 = borel.tannerMlink(theta = a.par, inverse = TRUE, deriv = 2)),
digits = 2)
## Example 2 ##
a.param <- c(0, 1, 5, 10, 1e2, 1e3)
rbind(a.values = a.param,
inv.BT = borel.tannerMlink(theta = a.param, inverse = TRUE))
data.inv <- borel.tannerMlink(borel.tannerMlink(a.param, inv = TRUE)) - a.param
summary(data.inv) ## Should be zero
## Example 3. Some link functions in VGAM with domain set (0, 1) ##
a.param <- ppoints(100)
par(lwd = 2)
plot(a.param, borel.tannerMlink(a.param), ylim = c(-5, 7), xlim = c(-0.01, 1.01),
type = "l", col = "gray10", ylab = "transformation",
las = 1, main = "Some probability link functions")
lines(a.param, logffMlink(a.param), col = "blue")
lines(a.param, logitlink(a.param), col = "limegreen")
lines(a.param, probitlink(a.param), col = "purple")
lines(a.param, clogloglink(a.param), col = "chocolate")
lines(a.param, cauchitlink(a.param), col = "tan")
abline(v = c(0.5, 1), lty = "dashed")
abline(v = 0, h = 0, lty = "dashed")
legend(0.05, 7, c("borel.tanneMlink", "logffMlink", "logitlink", "probitlink",
"clogloglink", "cauchitlink"),
col = c("gray10", "blue", "limegreen", "purple", "chocolate", "tan"),
lwd = 1)
par(lwd = 1)
Names/Value of linear predictors/parameters in time series family functions.
Description
Splitting out the names of linear predictors or Numeric values for parameters in time series family functions in VGAMextra.
Usage
break.VGAMextra(eta = NULL,
M1 = NULL,
noInter = NULL,
bOrder = NULL,
NOS = NULL,
lInter = "identitylink",
lvar = "loglink",
lsd = "loglink",
lcoeff1 = "rhobitlink",
lcoeff2 = "rhobitlink",
typeTS = "AR",
namesLP = FALSE,
Complete = FALSE,
varArg = NULL)
Arguments
eta |
A matrix of dimensions |
M1 |
Number of parameters involved in the |
noInter |
Logical. To determine whether the intercept is
estimated. If |
bOrder |
A vector. The order of the linear process fitted. Either a single number (if one response), or a vector (if multiple responses). |
NOS |
Integer. Number of respones set in the |
lInter, lvar, lsd, lcoeff1, lcoeff2 |
Link functions applied to parameters. Same as in
|
typeTS |
Character. Currently, options |
namesLP |
Logical. This function returns either the names of linear the
predictors/parameters ( if |
Complete |
Logical. If |
varArg |
Details
Time series family functions in VGAMextra currently recycle
the order set in the vglm.
Particularly, it occurs when the number of responses is fewer than
the specified order. For instance, if the order set in vglm
is c(1, 3), and 5 responses are managed, then the new order
becomes c(1, 3, 1, 3, 1).
Due to such flexibility, time series family functions require specific functions to unload the amount of code within each one.
Moreover, when the order is recycled, the matrix eta is
completed, as if the order was the same for each response.
This feature is enabled when Complete = TRUE.
This ‘common’ order turns out to be the maximum order established in
the vector order. This trick makes the family function to work
properly. To return to the riginal ‘order’, eta is reduced
in the same number of colums initially added.
break.VGAMextra works in this context. It may return either
the names of the linear predictors/parameters, or the parameter values
splitted out as a list. Thus, link functions entered in the vglm
call must be passed down to this functions.
For further details on link functions refer to
CommonVGAMffArguments.
Value
A list containing either the names of the linear predictors or the
parameters values (not linear predictors) unwrapped from tje
eta matrix, as follows:
a) If namesLP = FALSE (default), value of parameters are
returned in this order: the intercept (1), standard deviation and
variance of the white noise (2, 3), and the coefficients (4).
b) If namesLP = TRUE, names of linear predictors are
returned in the first entry, whereas parameter names are
allocated to the second entry.
Yee and Wild (1996) provide more detailed information about the relationship between linear predictors and parameters within the VGLM statistical framework.
Warning
Note that library VGAM is definitely required.
Warning
Be aware of the dimensions of matrix eta. It is c(n, M),
where n is the sample size, and M is the number of
parameters. If multiple responses, then M equals the summation
of parameters individually.
Author(s)
Victor Miranda and T. W. Yee
References
Yee, T. W. and Wild, C. J. (1996) Vector Generalized Additive Models. Journal of the Royal Statistical Society, Series B, Methodological, 58(3), 481–493.
See Also
ARXff,
MAXff,
CommonVGAMffArguments,
vglm.
Examples
library(VGAM)
eta <- matrix(runif(100), nrow = 10, ncol = 10)
M1 <- c(5, 5)
noInter <- FALSE
bOrder <- c(3, 3)
NOS <- 2
### ONLY LINEAR PREDICTORS/PARAMETERS NAMES!
### RETURNED OBJECT IS A LIST !
break.VGAMextra(M1 = M1,
noInter = noInter,
bOrder = bOrder,
NOS = NOS,
typeTS = "AR",
namesLP = TRUE,
varArg = TRUE)
### PARAMETER VALUEs... "UNWRAPPED". Inverse link functions are applied.
### Note that namesLP must be set to FALSE
break.VGAMextra(eta = eta,
M1 = M1,
noInter = noInter,
bOrder = bOrder,
NOS = NOS,
typeTS = "AR",
namesLP = FALSE,
varArg = TRUE)
Polynomial roots based on transfer operators in Vector Generalized Time Series Family Functions
Description
checkTS.VGAMextra computes the polynomial roots as per
transfer operator in Vector
Generalized Time Series Family Functions in VGAMextra
Usage
checkTS.VGAMextra(thetaEst = NULL,
tsclass = c("AR", "MA"),
chOrder = 1,
NofS = 1,
retmod = TRUE,
pRoots = TRUE)
Arguments
thetaEst |
A vector of coefficients. Its lenght must be |
tsclass |
Character indicating the model class to be checked. Presently,
options |
chOrder |
Positive integer. The order of polynomial associated to the
underlying procees involved:
either $p$ or $q$, which apply for
|
NofS |
A positive integer denoting the number of Time Series to verify.
In the |
retmod |
Logical. If |
pRoots |
Logical. If |
Details
Stationarity and/or Invertibility of time series (TS) are usually verified via the roots of the polynomial derived from the transfer operators.
In particular, checkTS.VGAMextra computes such roots via
the coefficients estimated by vector generalized TS family functions
available in VGAMextra ( ARXff, and
MAXff).
Specifically, checkTS.VGAMextra verifies whether the TS
analyzed via vglm is stationary or
invertible, accordingly.
Note that an autoregressive process of order-p
[AR(p)] with coefficients
\theta_{1}, \ldots, \theta_{p}
can be written in the form
\theta(B) Y_{t} = \varepsilon_{t},
where
\theta(B) = 1 - \sum_{k = 1}^{p} \theta_{k} B^{k}
Here, \theta(B) is referred to as
the transfer operator of the process, and
B^{k} Y_{t} = Y_{t - k},,
for k = 0, 1, \ldots,p, is the lagged single-function.
In general, an autoregressive process of order-p is
stationary if the roots of
\theta(z) = 1 - \theta_{1} z - \ldots - \theta_{q} z^q
lie outside the unit circle, i.e. |z| > 1.
Similarly, a moving-average process of order-q can be formulated
(without loss of generality \mu = 0)
Y_{t} = \psi(B) \varepsilon_{t},
where \psi(B) is the transfer operator, given by
\psi(B) = 1 + \sum_{k = 1}^{q} \psi_{k} B^{k},
Note that \psi_{0} = 1 , and
B^{k} \varepsilon_{t} = \varepsilon{t - k} .
Hence, a moving-average process of order-q [MA(q)],
generally given by (note \mu = 0)
Y_{t} = \phi_1 \varepsilon_{t - 1} + \ldots +
\phi_q \varepsilon_{t - q} + \varepsilon_{t},
is invertible if all the roots of
\phi(B) = 1 + \phi_{1} B + \ldots + \phi_{q} B^q
lie outside the unit circle., i.e.m |z| > 1.
Parallel arguments can be stated for autoregressive moving aberage processes (ARMA). See Box and Jenkins (1970) for further details.
Value
A vector whose elements are the roots of polynomials inherited from transfer operators according to the process analyzed.
Alternatively, the modules of roots can by returned instead of merely roots
via the retmod argument.
Warning
The argument thetaEst manages the coefficients of the TS
model(s) in turn. Then, it must be NofS * chOrder length,
where NofS is the number of responses established in the
vglm call.
Here the coefficients for each response must be sequentially groped.
A moving average process is always stationary (See Madsen (2007) for
further details). Consequently, the MAXff
in VGAMextra verifies (by default) only for invertibility.
To enable this option set nowarning = FALSE in the
MAXff
call.
Similarly, ARXff verifies whether the TS data
fitted is stationary, whereas ARMAXff() verifies both properties.
Note
For TS family functions in the VGLM/VGAM framework,
checkTS.VGAMextra is called at the final iteration of Fisher scoring.
It means that the MLE estimates are actually evaluated to
verify whether the process is stationary or invertible.
If any root has module less than 1 + 1e-5, a warning is displayed
for informative purposes.
Argument thetaEst manages the parameters of the TS model
in turn.
Author(s)
Victor Miranda and T. W. Yee.
References
Box, G.E.P. and Jenkins, G.M. (1970) Time Series Analysis: Forecasting and Control. Holden-Day, San Francisco, USA.
Madsen, H (2007) Time Series Analysis. Chapman and Hall/CRC, Boca Raton, Florida, USA.
Examples
# A moving average process order-3 with coeffs --> c(2.4, -5.6, 0.83)
#-------------------------#
# This is NOT invertible !
#-------------------------#
MAcoeffs <- c(2.4, -5.6, 0.83)
checkTS.VGAMextra(thetaEst = MAcoeffs,
tsclass = "MA",
chOrder = 3,
retmod = FALSE)
# AR process order-3 with coeffs --> c( 0.45, 0.45^2, 0.45^3 )
#-------------------------#
# This is stationary !
#-------------------------#
ARcoeffs <- c( 0.45 , 0.45^2 , 0.45^3 )
checkTS.VGAMextra(thetaEst = ARcoeffs,
tsclass = "AR",
chOrder = 3,
retmod = TRUE,
pRoots = TRUE) # DEFAULT for 'pRoots'
Constraint matrices for vector generalized time series family functions.
Description
Constraint matrices for coefficients of vector genelized time series family functions in VGAMextra.
Usage
cm.ARMA(Model = ~ 1,
Resp = 1,
lags.cm = 2,
offset = -2,
whichCoeff = 1,
factorSeq = 2)
Arguments
Model |
A symbolic description of the model being fitted. Must match that
formula specified in the |
lags.cm |
Vector of POSITIVE integers greater than 1 indicating the order
for each response. It must match the orders entered in the
|
offset |
Vector of integers specifying the position of the ARMA coefficient
at which constraints initiate FOR each response.
If negative, it is recycled and the absolute value is used. The default
value is Particularly, if only one coefficient is being estimated, i.e, an AR
or MA process of order-1 is being fitted, then NO restrictions over the
(unique) coefficient are needed. Consequently, |
whichCoeff |
Vector of POSITIVE integers strictly less than ' If |
Resp |
The number of responses in the model fitted. Must match the number of
responses given in |
factorSeq |
Vector of POSITIVE integers. Thus far, restrictions handled are
geometric sequences and arithmetic progressions.
Hence, See below for further details. |
Details
NOTE: Except for the Model, all arguments of length 1 are
recycled when Resp \geq 2.
Time Series family functions in VGAMextra that are derived from AR(p) or MA(q) processes include the drift term (or mean) and the white noise standard deviation as the first two elements in the vector parameter. For an MA(4), for example, it is given by
(\mu, \sigma_\varepsilon, \phi_1, \phi_2, \phi_3, \phi_4).
Thus, constraint matrices on coefficients can be stated from the
second coefficient, i.e., from
\phi_2. This feature is specified with
offset = -2 by default.
In other words, offset indicates the exact position at
which parameter restrictions commence. For example, offset = -3
indicates that \phi_3 is the first coefficient over
which constraints are applied. Then, in order to successfully utilize
this argument, it must be greater than or equal to 2 in absolute value.
Otherwise, an error message will be displayed as no single restriction
are amenable with \phi_1 only.
Furthermore, if lags.cm = 1, i.e, a AR or MA process of order one
is being fitted, then NO constraints are required either, as only one
coefficient is directly considered.
Hence, the miminum absolute value for argument offset is
2
As for the factorSeq argument, its defaul value is 2.
Let factorSeq = 4, lags.cm = 5, offset = -3, and
whichCoeff = 1. The coefficient restrictions if a
geometric progression is assumed are
\theta_3 = \theta_1^4,
\theta_4 = \theta_1^5,
\theta_5 = \theta_1^6,
If coefficient restrictions are in arithmetic sequence, constraints are given by
\theta_3 = 4 * \theta_1,
\theta_4 = 5 * \theta_1,
\theta_5 = 6 * \theta_1,
The difference lies on thelink function used:
loglink for the first case, and
identitylink for the latter.
Note that conditions above are equivalent to test the following two Null Hypotheses:
Ho: \theta_k = \theta_1^k
or
Ho: \theta_k = j * \theta_1
for k = 3, 4, 5.
Simpler hypotheses can be tested by properly setting all arguments
in cm.ARMA().
For instance, the default list of constraint matrices returned by
cm.ARMA() allows to test
Ho: \theta_k = \theta_1^j
for k = 2, in a TS model of order-2 with one response.
Value
A list of constraint matrices with specific restrictions over
the AR(p), MA(q) or ARMA (p, q) coefficients.
Each matrix returned is conformable with the VGAM/VGLM framework.
Paragrpah above means that each constraint matrix returned by
cm.ARMA() is full-rank with M rows (number of parameters),
as required by VGAM. Note that constraint matrices within
the VGAM/VGLM framework are M by M identity matrices
by default.
Restrictions currently handled by cm.ARMA() are (increasing)
arithmetic and geometric progressions.
Warning
Hypotheses above can be tested by properly applying parameter link functions. If the test
Ho: \theta_k = \theta_1^k,
arises, then constraint matrices returned by cm.ARMA() are
conformable to the use of loglink.
On the other hand, the following hypothesis
Ho: \theta_k = k * \theta_1,
properly adapts to the link function
identitylink. k = 2, 3,\ ldots.
For further details on parameter link functions within VGAM, see
CommonVGAMffArguments.
Note
cm.ARMA() can be utilized to compute constraint matrices
for many VGLTSM fmaily functions, e.g.,
ARXff and
MAXff in VGAMextra.
More improvements such as restrictions on the drift parameter and white noise standard deviation will be set later.
Author(s)
Victor Miranda and T. W. Yee
References
Yee, T. W. and Hastie, T. J. (2003) Reduced-rank vector generalized linear models. Statistical Modelling, 3, 15–41.
Yee, T. W. (2008)
The VGAM Package.
R News, 8, 28–39.
See Also
loglink,
rhobitlink,
CommonVGAMffArguments.
Examples
#############
# Example 1.
#############
# Constraint matrices for a TS family function (AR or MA)
# with 6 lagged terms.
# Restriction commences at the third position (theta[3]) powered to
# or multiplied by 4. Intercept-only model.
position <- -3
numberLags <- 6
myfactor <- 4
cm.ARMA(offset = position, lags.cm = numberLags, factorSeq = myfactor)
# With one covariate
cm.ARMA(Model = ~ x2, offset = position,
lags.cm = numberLags, factorSeq = myfactor)
# Or 2 responses...
cm.ARMA(offset = position, lags.cm = numberLags,
factorSeq = myfactor, Resp = 2)
# The following call causes an ERROR.
# cm.ARMA(offset = -1, lags.cm = 6, factorSeq = 2)
##############
# Example 2.
##############
# In this example, the use of constraints via 'cm.ARMA()' is
# included in the 'vglm' call. Here, two AR(2) models are fitted
# in the same call (i.e. two responses), where different constraints
# are set, as follows:
# a) list(ar = c(theta1, theta1^2)) and
# b) list(ar = c(theta2, theta2^2 )).
# 2.0 Generate the data.
set.seed(1001)
nn <- 100
# A single covariate.
covdata <- data.frame(x2 = runif(nn))
theta1 <- 0.40; theta2 <- 0.55
drift <- c(0.5, 0.75)
sdAR <- c(sqrt(2.5), sqrt(2.0))
# Generate AR sequences, TS1 and TS2, considering Gaussian white noise
# Save both in a data.frame object: the data.
tsdata <-
data.frame(covdata, # Not used
TS1 = arima.sim(nn,
model = list(ar = c(theta1, theta1^2)),
rand.gen = rnorm,
mean = drift[1], sd = sdAR[1]),
TS2 = arima.sim(nn,
model = list(ar = c(theta2, theta2^2)),
rand.gen = rnorm,
mean = drift[2], sd = sdAR[2]))
# 2.1 Fitting both time series with 'ARXff'... multiple responses case.
fit1 <- vglm(cbind(TS1, TS2) ~ 1,
ARXff(order = c(2, 2), type.EIM = "exact"),
data = tsdata,
trace = TRUE)
Coef(fit1)
coef(fit1, matrix = TRUE)
summary(fit1)
## Same length for both vectors, i.e. no constraints.
length(Coef(fit1))
length(coef(fit1, matrix = TRUE))
###2.2 Now, fit the same models with suitable constraints via 'cm.ARMA()'
# Most importantly, "loglink" is used as link function to adequately match
# the relationship between coefficients and constraints. That is:
# theta2 = theta1^2, then log(theta2) = 2 * log(theta1).
fit2 <- vglm(cbind(TS1, TS2) ~ 1,
ARXff(order = c(2, 2), type.EIM = "exact", lARcoeff = "loglink"),
constraints = cm.ARMA(Model = ~ 1,
Resp = 2,
lags.cm = c(2, 2),
offset = -2),
data = tsdata,
trace = TRUE)
Coef(fit2)
coef(fit2, matrix = TRUE)
summary(fit2)
# NOTE, for model 1, Coeff2 = Coeff1^2, then log(Coeff2) = 2 * log(Coeff1)
( mycoef <- coef(fit2, matrix = TRUE)[c(3, 4)] )
2 * mycoef[1] - mycoef[2] # SHOULD BE ZERO
# Ditto for model 2:
( mycoef <- coef(fit2, matrix = TRUE)[c(7, 8)] )
2 * mycoef[1] - mycoef[2] # SHOULD BE ZERO
## Different lengths, due to constraints
length(Coef(fit2))
length(coef(fit2, matrix = TRUE))
Density for the multivariate Normal distribution
Description
Density for the multivariate Normal distribution
Usage
dmultinorm(vec.x, vec.mean = c(0, 0),
mat.cov = c(1, 1, 0),
log = FALSE)
Arguments
vec.x |
For the |
vec.mean |
The vector of means. |
mat.cov |
The vector of variances and covariances, arranged in that order. See below for further details. |
log |
Logical. If |
Details
This implementation of the multivariate (say
R–dimensional) Normal density
handles the variances and covariances, instead of
correlation parameters.
For more than one observation, arrange all entries in matrices accordingly.
For each observation, mat.cov is a vector of
length R \times (R + 1) / 2,
where the first R entries are the variances
\sigma^2{i}, i = 1, \ldots, R, and
then the covariances arranged as per rows, that is,
cov_{ij} i = 1, \ldots, R,
j = i + 1, \ldots, R.
By default, it returns the density of two independent standard Normal distributions.
Value
The density of the multivariate Normal distribution.
Warning
For observations whose covariance matrix is not positive definite,
NaN will be returned.
Author(s)
Victor Miranda
See Also
Examples
###
### Two - dimensional Normal density.
###
set.seed(180228)
nn <- 25
mean1 <- 1; mean2 <- 1.5; mean3 = 2
var1 <- exp(1.5); var2 <- exp(-1.5); var3 <- exp(1); cov12 = 0.75
dmvndata <- rbinorm(nn, mean1 = 1, mean2 = 1.5, var1 = var1, var2 = var2,
cov12 = cov12)
## Using dbinorm() from VGAM.
d2norm.data <- dbinorm(x1 = dmvndata[, 1], x2 = dmvndata[, 2],
mean1 = mean1, mean2 = mean2, var1 = var1, var2 = var2,
cov12 = cov12)
## Using dmultinorm().
d2norm.data2 <- dmultinorm(vec.x = dmvndata, vec.mean = c(mean1, mean2),
mat.cov = c(var1, var2, cov12))
summary(d2norm.data)
summary(d2norm.data2)
##
## 3--dimensional Normal.
##
dmvndata <- cbind(dmvndata, rnorm(nn, mean3, sqrt(var3)))
d2norm.data3 <- dmultinorm(dmvndata, vec.mean = c(mean1, mean2, mean3),
mat.cov = c(var1, var2, var3, cov12, 0, 0))
hist(d2norm.data3)
summary(d2norm.data3)
Link functions for the mean of 1–parameter continuous distributions: The exponential distribution.
Description
Computes the expMlink transformation, its inverse and
the first two derivatives.
Usage
expMlink(theta, location = 0, bvalue = NULL, inverse = FALSE,
deriv = 0, short = TRUE, tag = FALSE)
Arguments
theta |
Numeric or character. This is |
location |
This is a known location parameter.
Same as |
bvalue, inverse, deriv, short, tag |
See |
Details
This is a link function to model the mean of the exponential
distribution, exponential.
It is defined as
\eta = \log (A + \lambda^{-1}),
where \lambda > 0 is a rate parameter and A is a
known location parameter,
same as exponential.
Numerical values of \lambda out of range may
result in Inf, -Inf, NA or NaN.
Value
For deriv = 0, the expMlink transformation of
theta when inverse = FALSE.
If inverse = TRUE, then the inverse exp(theta - A)^(-1).
For deriv = 1,
d eta / d theta when inverse = FALSE.
If inverse = TRUE, then
d theta / d eta as a function of
theta.
Similarly, when deriv = 2, the second derivatives in
terms of theta are returned.
Note
Numerical instability may occur for values theta too close
to zero. Use argument bvalue to replace them before
computing the link.
If theta is character, then arguments inverse and
deriv are ignored. See Links
for further details.
Author(s)
V. Miranda and Thomas W. Yee.
See Also
Examples
## E1. Modelling the mean of the exponential distribution ##
set.seed(17010402)
nn <- 100
edata <- data.frame(x2 = runif(nn) - 0.5, x3 = runif(nn) - 0.5)
edata <- transform(edata, eta = 0.2 - 0.7 * x2 + 1.9 * x3)
#----- The mean is a function of 'x2' and 'x3' ------#
edata <- transform(edata, rate = expMlink(eta, inverse = TRUE))
edata <- transform(edata, y = rexp(nn, rate = rate))
with(edata, stem(y))
with(edata, hist(y))
exp.fit <- vglm(y ~ x2 + x3, exponential(link = "expMlink"),
data = edata, zero = NULL, trace = TRUE, crit = "log")
coef(exp.fit, matrix = TRUE)
summary(exp.fit)
## E2. expMlink() and its inverse ##
theta <- 0.1 + 1:5
location <- 1.5
my.diff <- theta - expMlink(expMlink(theta = theta,
location = location), location = location, inverse =TRUE)
summary(my.diff) # Zero
## E3. Special values in a matrix ##
theta <- matrix(c(Inf, -Inf, NA, NaN, 1 , 2), ncol = 3, nrow = 2)
expMlink(theta = theta, location = location)
Link functions for the quantiles of several 1–parameter continuous distributions.
Description
Computes the expQlink transformation, its inverse and
the first two derivatives.
Usage
expQlink(theta, p = stop("Argument 'p' must be entered."),
bvalue = NULL, inverse = FALSE,
deriv = 0, short = TRUE, tag = FALSE)
Arguments
theta |
Numeric or character. This is |
p |
Numeric. A prespecified number between 0 and 1.
The particular |
bvalue, inverse, deriv, short, tag |
See |
Details
This is a link function to model any fixed quantile, say
\xi_p, of the exponential distribution.
It is called the expQlink transformation and is defined as
\log (1 - p)^{-1/\lambda},
where \lambda is positive as in
exponential.
Numerical values of \lambda or p out of range may
result in Inf, -Inf, NA or NaN.
Value
With deriv = 0, the expQlink transformation of
theta for prespecified p when inverse = FALSE.
If inverse = TRUE, then the inverse -log(1 - p)/theta.
For deriv = 1, this link function returns
d eta / d theta when inverse = FALSE.
If inverse = TRUE, then
d theta / d eta as a function of
theta.
Similarly, when deriv = 2, the second derivatives in
terms of theta are returned.
Note
Numerical instability may occur for values theta too close
to zero. Use argument bvalue to replace them before
computing the link.
If theta is character, then arguments inverse and
deriv are ignored. See Links
for further details.
Author(s)
V. Miranda and Thomas W. Yee.
See Also
Examples
## E1. expQlink() and its inverse ##
p <- 0.25 # Modelling the first quartile.
my.theta <- seq(0, 5, by = 0.1)[-1]
my.diff <- my.theta - expQlink(expQlink(my.theta, p = p), p = p, inverse =TRUE)
summary(my.diff) # Zero
## E2. Special values ##
expQlink(theta = c(Inf, -Inf, NA, NaN), p = p)
## E3. Plot of expQlink() for different quantiles ##
plot(expQlink(my.theta, p = p) ~ my.theta,
type = "l", lty = "dotted", col = "blue", lwd = 2,
main = "expQink(p) transformation", xlab = "theta", ylab = "expQLink",
xlim = c(-0.5, 5), ylim = c(-0.5, 5))
abline(h = 0, v = 0, lwd = 2)
lines(my.theta, expQlink(my.theta, p = 0.50), col = "green", lty = "dotted", lwd = 2)
lines(my.theta, expQlink(my.theta, p = 0.75), col = "red", lty = "dotted", lwd = 2)
legend(2, 4, c("p = 0.25", "p = 0.50", "p = 0.75"), col = c("blue", "green", "red"),
lwd = c(2, 2, 2), lty = c("dotted", "dotted", "dotted"))
Link functions for the quantiles of several 1–parameter continuous distributions
Description
Computes the gamma1Qlink transformation, its inverse and the
first two derivatives.
Usage
gamma1Qlink(theta, p = stop("Argument 'p' must be specified."),
bvalue = NULL, inverse = FALSE,
deriv = 0, short = TRUE, tag = FALSE)
Arguments
theta |
Numeric or character. It is |
p |
A numeric vector of |
bvalue, inverse, deriv, short, tag |
See |
Details
This link function has been specifically designed to model any
p–quantile of the 1–parameter gamma distribution,
gamma1, in the VGLM/VGAM context.
It is defined as
\eta = \log {\tt{qgamma}}({\tt{p}}, {\tt{shape =}} s),
where s is a positive shape parameter as in
gamma1, whilst
{\tt{qgamma()}} is the quantile function
qgamma.
The inverse of the gamma1Qlink cannot be expressed in closed
form. Instead, the inverse image, s_{\eta},
of \eta is numerically approximated by
newtonRaphson.basic.
Numerical values of s or p out of range will
result in Inf, -Inf, NA or NaN
correspondingly.
Arguments inverse and deriv are dismissed
if theta is character.
Value
For deriv = 0, the gamma1Qlink transformation of
theta, when inverse = FALSE.
If inverse = TRUE, then
theta becomes \eta, and therefore,
the approximate inverse image of \eta is returned.
For deriv = 1, the partial derivative
d eta / d theta is returned,
if inverse = FALSE. If inverse = TRUE,
then the reciprocal
d theta / d eta as a function of theta.
If deriv = 2, then the second order derivatives as a function
of theta.
Note
Numerical instability may occur for values theta too large, or
too close to 0.0. Use argument bvalue to replace them before
computing the link.
Author(s)
V. Miranda and Thomas W. Yee.
See Also
Examples
## E1. gamma1QLink() and values causing NaNs or out of range ##
p <- 0.75 # The third quartile is of interest.
my.s <- seq(0, 5, by = 0.1)[-1]
max(my.s - gamma1Qlink(gamma1Qlink(my.s, p = p), p = p, inverse =TRUE)) ## Zero
## E2. Special values of theta ##
gamma1Qlink(theta = c(-0.15, -0.10, 0, 1:10) , p = p, inverse = FALSE) ## NaNs
gamma1Qlink(theta = c(-5, -3, 0, 1:10) , p = p, inverse = TRUE) ## Out of range
## E3. Plot of gamma1QLink() and its inverse. ##
# gamma1Qlink()
plot(gamma1Qlink(theta = my.s, p = p) ~ my.s,
type = "l", col = "blue", lty = "dotted", lwd = 3,
xlim = c(-0.1, 5), ylim = c(-5, 15), las = 1,
main = c("Blue is gamma1Qlink(), green is the inverse"),
ylab = "gamma1Qlink transformation", xlab = "theta")
abline(h = 0, v = 0, lwd = 2)
# The inverse
lines(my.s, gamma1Qlink(theta = my.s, p = p, inverse = TRUE),
col = "green", lwd = 2, lty = "dashed")
# The identity function, for double-checking.
lines(my.s, my.s, lty = "dotted")
Link functions for the mean of 2–parameter continuous distributions: The gamma distribution.
Description
Computes the gammaRMlink transformation, its inverse and
the first two derivatives.
Usage
gammaRMlink(theta, shape = NULL, wrt.param = NULL,
bvalue = NULL, inverse = FALSE,
deriv = 0, short = TRUE, tag = FALSE)
Arguments
theta |
Numeric or character. This is |
shape |
The shape parameter. Same as
|
wrt.param |
Positive integer, either |
bvalue, inverse, deriv, short, tag |
See |
Details
The link to model the mean of the 2–parameter gamma distribution.
The gammaRMlink transformation, for given
\alpha ('shape' parameter),
is defined as
\eta = \eta(\alpha; \beta) = \log \frac{\alpha}{\beta},
where \beta > 0 is a rate parameter.
This link is expressly a function of \beta, i.e.
\theta, therefore \alpha (shape)
must be entered at every call.
Numerical values of \alpha or \beta out of range may
result in Inf, -Inf, NA or NaN.
Value
For deriv = 0, the gammaRMlink transformation of
theta, i.e. \beta, when inverse = FALSE.
If inverse = TRUE, then \theta becomes \eta,
and the inverse,
\alpha * exp(-theta), for given \alpha, is
returned.
For deriv = 1, theta becomes
\theta = (\beta, \alpha)=(\theta1, \theta2), and
\eta = (\eta1, \eta2),
and then, the argument wrt.param must be
considered:
A) If inverse = FALSE, then
d eta1 / d theta1 when
wrt.param = 1, and
d eta1 / d theta2 if
wrt.param = 2.
B) For inverse = TRUE, this function returns
d theta1 / d eta1 and
d theta2 / d eta1 conformably arranged
in a matrix, if wrt.param = 1,
as a function of \theta_i, i = 1, 2.
Also, when wrt.param = 2, a matrix with columns
dtheta1 / d eta2 and
dtheta2 / d eta2
is returned.
Similarly, when deriv = 2, the second derivatives in
terms of theta are returned.
Note
Numerical instability may occur for values theta too close
to zero. Use argument bvalue to replace them before
computing the link.
If theta is character, then arguments inverse and
deriv are ignored. See Links
for further details.
Author(s)
V. Miranda and Thomas W. Yee.
See Also
Examples
eta <- seq(-3, 3, by = 0.1) # this is eta = log(mu(b, a)).
shape <- rep(exp(0.8), length(eta)) # 'shape' argument.
## E1. Get 'rate' values.
theta <- gammaRMlink(theta = eta, shape = shape, inverse = TRUE) # rate
## Not run:
## E2. Plot theta vs. eta, 'shape' fixed.
plot(theta, eta, type = "l", las = 1, ylab = "",
main = "gammaRMlink(theta; shape)")
## End(Not run)
## E3. gammaRMlink() and its inverse ##
etabis <- gammaRMlink(theta = theta, shape = shape, inverse = FALSE)
my.diff <- eta - etabis
summary(my.diff) # Zero
## E4. Special values arranged in a matrix ##
bbeta <- matrix(eta[1:9], ncol = 3, nrow = 3) #Ensure equal dimensions.
alpha <- matrix(c(Inf, -Inf, NA, NaN, -1 , 1, 0, -2, 2), ncol = 3, nrow = 3)
# The gammaRMlink transformation (log(a/b))
gammaRMlink(theta = bbeta, shape = alpha, inv = FALSE) # NaNs produced.
# Same as
log(alpha/bbeta)
2–parameter Gamma Distribution
Description
Estimates the 2–parameter gamma distribution by maximum likelihood. One linear predictor models the mean.
Usage
gammaRff(zero = "shape", lmu = "gammaRMlink",
lrate = NULL, lshape = "loglink",
irate = NULL, ishape = NULL, lss = TRUE)
Arguments
zero |
Specifies the parameters to be modelled as intercept–only. |
lmu |
The link function applied to the gamma distribution mean, i.e.,
|
lrate, lshape, irate, ishape, lss |
Same as |
Details
This family function slightly enlarges the functionalities of
gammaR by directly modelling the mean
of the gamma distribution. It performs very much like
gamma2, but involves the ordinary
(not reparametrized) density, given by
f(y; \alpha, \beta) = \frac{ \beta^\alpha }{ \Gamma(\alpha) }
e^{-\beta y} y^{\alpha - 1},
Here, \alpha and \beta are positive shape
and rate parameters as in gammaR.
The default linear predictors are
\eta1 = {\tt{gammaRMlink}}(\alpha; \beta) = \log \mu =
\log (\alpha / \beta), and
\eta2 = \log \alpha,
unlike \eta1 = \log \beta and
\eta2 = \log \alpha
from gammaR.
lmu overrides lrate and no link other than
gammaRMlink is a valid entry
(lmu). To
mimic gammaR simply
set lmu = NULL and lrate = "loglink".
The mean (\mu) is returned as the fitted values.
gammaRff differs from
gamma2. The latter estimates a
re-parametrization
of the gamma distribution in terms \mu and
\alpha.
This VGAM family function does not handle censored data.
Value
An object of class "vglm".
See vglm-class for full details.
Note
The parameters \alpha and \beta
match the arguments shape and rate of
rgamma.
Multiple responses are handled.
Author(s)
V. Miranda and Thomas W. Yee.
References
Yee, T. W. (2015) Vector Generalized Linear and Additive Models: With an Implementation in R. Springer, New York, USA.
See Also
gammaRMlink,
CommonVGAMffArguments,
gammaR,
gamma2,
Links.
Examples
### Modelling the mean in terms of x2, two responses.
set.seed(2017022101)
nn <- 80
x2 <- runif(nn)
mu <- exp(2 + 0.5 * x2)
# Shape and rate parameters in terms of 'mu'
shape <- rep(exp(1), nn)
rate <- gammaRMlink(theta = log(mu), shape = shape,
inverse = TRUE, deriv = 0)
# Generating some random data
y1 <- rgamma(n = nn, shape = shape, rate = rate)
gdata <- data.frame(x2 = x2, y1 = y1)
rm(y1)
# lmu = "gammaRMlink" replaces lshape, whilst lrate = "loglink"
fit1 <- vglm(cbind(y1, y1) ~ x2,
gammaRff(lmu = "gammaRMlink", lss = TRUE, zero = "shape"),
data = gdata, trace = TRUE, crit = "log")
coef(fit1, matrix = TRUE)
summary(fit1)
# Comparing fitted values with true values.
compare1 <- cbind(fitted.values(fit1)[, 1, drop = FALSE], mu)
colnames(compare1) <- c("Fitted.vM1", "mu")
head(compare1)
### Mimicking gammaR. Note that lmu = NULL.
fit2 <- vglm(y1 ~ x2, gammaRff(lmu = NULL, lrate = "loglink",
lshape = "loglink", lss = FALSE, zero = "shape"),
data = gdata, trace = TRUE, crit = "log")
# Compare fitted values with true values.
compare2 <- with(gdata, cbind(fitted.values(fit2), y1, mu))
colnames(compare2) <- c("Fitted.vM2", "y", "mu")
head(compare2)
### Fitted values -- Model1 vs Fitted values -- Model2
fit1vsfit2 <- cbind(fitted.values(fit1)[, 1, drop = FALSE],
fitted.values(fit2))
colnames(fit1vsfit2) <- c("Fitted.vM1", "Fitted.vM2")
head(fit1vsfit2)
### Use gamma2()
fit3 <- vglm(y1 ~ x2, gamma2,
data = gdata, trace = TRUE, crit = "log")
fit1.fit3 <- cbind(fitted.values(fit1)[, 1, drop = FALSE],
fitted.values(fit2), fitted.values(fit3))
colnames(fit1.fit3) <- c("Fitted.vM1", "Fitted.vM2", "Fitted.vM3")
head(fit1.fit3)
Generalized Beta Distribution of the Second Kind family function
Description
Maximum likelihood estimation of the 4-parameter generalized beta II distribution using Fisher scoring.
Usage
gen.betaIImr(lscale = "loglink",
lshape1.a = "loglink",
lshape2.p = "loglink",
lshape3.q = "loglink",
iscale = NULL,
ishape1.a = NULL,
ishape2.p = NULL,
ishape3.q = NULL,
imethod = 1,
lss = TRUE,
gscale = exp(-5:5),
gshape1.a = exp(-5:5),
gshape2.p = exp(-5:5),
gshape3.q = exp(-5:5),
probs.y = c(0.25, 0.50, 0.75),
zero = "shape" )
Arguments
lscale, lshape1.a, lshape2.p, lshape3.q |
Parameter link functions applied to the
shape parameter |
iscale, ishape1.a, ishape2.p, ishape3.q |
Optional initial values for |
imethod |
Initializing method to internally compute the initial values.
Currently, only |
gscale, gshape1.a, gshape2.p, gshape3.q |
Grid search initial values.
See |
zero |
Numeric or Character vector. Position(s) or name(s) of the
parameters/linear predictors to be modeled as intercept-only.
Details at
|
lss, probs.y |
See |
Details
This distribution is most useful for unifying a substantial number of size distributions. For example, the Singh-Maddala, Dagum, Fisk (log-logistic), Lomax (Pareto type II), inverse Lomax, beta distribution of the second kind distributions are all special cases. Full details can be found in Kleiber and Kotz (2003), and Brazauskas (2002). The argument names given here are used by other families that are special cases of this family. Fisher scoring is used here and for the special cases too.
The 4-parameter generalized beta II distribution has density
f(y) = a y^{ap-1} / [b^{ap} B(p,q) \{1 + (y/b)^a\}^{p+q}]
for a > 0, b > 0, p > 0, q > 0,
y \geq 0.
Here B is the beta function, and
b is the scale parameter scale,
while the others are shape parameters.
The mean is
E(Y) = b \, \Gamma(p + 1/a) \, \Gamma(q - 1/a) /
(\Gamma(p) \, \Gamma(q))
provided -ap < 1 < aq; these are returned as the fitted values.
Value
An object of class "vglmff"
(see vglmff-class).
The object is used by modelling functions such as
vglm,
and vgam.
Warning
zero can be a numeric or a character vector
specifying the position(s) or the name(s) (partially or not) of the linear
predictors modeled as intercept–only. Numeric values can be
entered as usual. If names are used, note that the linear predictors in
this family function are
c("scale", "shape1.a", "shape2.p", "shape3.q").
For simplicity, using names rather than numeric vectors is recommended.
Note
Paramaters "shape1.a", "shape2.p", "shape3.q" are modeled
as intercept only, by default.
If the self-starting initial values fail, try experimenting
with the initial value arguments, iscale, ishape1.a,
ishape2.p and ishape3.q whose default is NULL.
Also, the constraint -ap < 1 < aq may be violated as the
iterations progress so it is worth monitoring convergence, e.g.,
set trace = TRUE.
Successful convergence depends on choosing good initial values. This
process might be difficult for this distribution, since 4 parameters are
involved. Presently, only method = 1 is internally handled to set
initial values. It involves grid search, an internal implementation
of the well-known grid search algorithm for exhaustive searching through
a manually specified subset of the hyperparameter space.
Default value of lss is TRUE standing for the
following order: location (b), shape1.a (a), shape2.p (p),
shape3.q (q). In order to match the arguments of existing R functions,
the option lss = FALSE might be set leading to switch the
position of location (b) and shape1.a (a), only.
Author(s)
T. W. Yee and V. Miranda.
References
Brazauskas, V. (2002) Fisher information matrix for the Feller-Pareto distribution. Statistics & Probability Letters, 59, 159–167.
Kleiber, C. and Kotz, S. (2003) Statistical Size Distributions in Economics and Actuarial Sciences. Wiley Series in Probability and Statistics. Hoboken, New Jersey, USA.
McDonald, J. B. and Xu, Y. J. (1995) A generalization of the beta distribution with applications. Journal of Econometrics. 66, p.133–152.
McDonald, J. B. (1984) Some generalized functions for the size distribution of income. Econometrica, 52, 647–663.
See Also
betaff,
betaII,
dagum,
sinmad,
fisk,
lomax,
inv.lomax,
paralogistic,
inv.paralogistic,
genbetaII.
Examples
#----------------------------------------------------------------------- #
# An example.- In this data set parameters 'shape1.a' and 'shape3.q' are
# generated in terms of x2.
set.seed(1003)
nn <- 200
gdata1 <- data.frame(x2 = runif(nn))
gdata <- transform(gdata1,
y1 = rgen.betaII(nn, scale = exp(1.1), shape1.a = exp(1.2 + x2),
shape2.p = exp(0.7) , shape3.q = exp(2.1 - x2)),
y2 = rgen.betaII(nn, scale = exp(2.0), shape1.a = exp(1.8 + x2),
shape2.p = exp(2.3) , shape3.q = exp(1.9 - x2)),
y3 = rgen.betaII(nn, scale = exp(1.5), shape1.a = exp(1.8),
shape2.p = exp(2.3) , shape3.q = exp(1.3)))
#------------------------------------------------------------------------#
# A single intercept-only model. No covariates.
# Note the use of (optional) initial values.
fit <- vglm(y2 ~ 1, #y3 ~ 1
gen.betaIImr(lss = TRUE,
# OPTIONAL INITIAL VALUES
#iscale = exp(1.5),
#ishape1.a = exp(1.8),
#ishape2.p = exp(2.3),
#ishape3.q = exp(1.3),
imethod = 1),
data = gdata, trace = TRUE, crit = "loglik")
Coef(fit)
coef(fit, matrix = TRUE)
summary(fit)
#------------------------------------------------------------------------#
# An intercept-only model. Two responses.
fit1 <- vglm(cbind(y2, y2) ~ 1, # cbind(y1, y2)
gen.betaIImr(lss = TRUE),
data = gdata, trace = TRUE, crit = "loglik")
Coef(fit1)
coef(fit1, matrix = TRUE)
summary(fit1)
vcov(fit1, untransform = TRUE)
#------------------------------------------------------------------------#
# An example incorporating one covariate. Constraints are set accordingly.
# x2 affects shape1.a and shape3.q.
# Note that the first option uses 'constraints', whilst in the second
# choice we use the argument 'zero' to 'set' the same constraints.
### Option 1.
c1 <- rbind(0, 1, 0, 0)
c2 <- rbind(0, 0, 0, 1)
mycons <- matrix( c(c1, c2), nc = 2, byrow = FALSE)
fit2 <- vglm(y1 ~ x2, gen.betaIImr(lss = TRUE, zero = NULL),
data = gdata, trace = TRUE, crit = "loglik",
constraints = list(x2 = mycons ))
coef(fit2, matrix = TRUE)
summary(fit2)
vcov(fit2)
constraints(fit2)
### Option 2.
fit3 <- vglm(y1 ~ x2,
gen.betaIImr(lss = TRUE,
zero = c("scale", "shape2.p")),
data = gdata, trace = TRUE, crit = "loglik")
coef(fit3, matrix = TRUE)
summary(fit3)
vcov(fit3)
constraints(fit3)
The Generalized Beta Distribution of the Second King
Description
Density, distribution function, inverse distribution (quantile function) and random generation for the Generalized Beta of the Second Kind (GB2).
Usage
dgen.betaII(x, scale = 1.0, shape1.a = 1.0, shape2.p = 1.0, shape3.q = 1.0,
log = FALSE)
pgen.betaII(q, scale = 1.0, shape1.a = 1.0, shape2.p = 1.0, shape3.q = 1.0,
lower.tail = TRUE, log.p = FALSE)
qgen.betaII(p, scale = 1.0, shape1.a = 1.0, shape2.p = 1.0, shape3.q = 1.0,
lower.tail = TRUE, log.p = FALSE)
rgen.betaII(n, scale = 1.0, shape1.a = 1.0, shape2.p = 1.0, shape3.q = 1.0)
Arguments
x, q |
Vector of quantiles. |
p |
Vector of probabilities. |
n |
Number of observations. If |
scale, shape1.a, shape2.p, shape3.q |
Strictly positive scale and shape parameters. |
log, log.p, lower.tail |
Same meaning as in |
Details
The GB2 Distribution is defined by the probability density (pdf)
f(y) = \frac{a x^{ap - 1}}{b^{ap} B(p, q) [1 + (y/b)^{a}]^{p + q},}
for y > 0, and b, a, p, q > 0.
Here, B(p, q) is the beta function as in
beta.
The GB2 Distribution and the Beta Distribution
(see Beta) are linked, as follows:
Let X be a random variable with the Beta density and parameters
p = shape_{1} and q = shape_{2}.
Then, introducing additional b = scale and
a = shape parameters, the variable
Y = \frac{(x/b)^{a}}{1 + (x/b)^{a}}
has the GB2 Distribution, with parameters b, a, p, q.
The GB2 k^{th} moment exists for -ap < k < aq
and is given by
E(Y^{k}) = \frac{b^{k} B(p + k/a, q - k/a)}{B(p, q)}
or, equivalently,
E(Y^{k}) = \frac{b^{k} \Gamma(p + k/a) \Gamma(q - k/a)}
{\Gamma(p) \Gamma(q)}).
Here, \Gamma(\cdot) is the gamma function as in
gamma.
Value
dgen.betaII() returns the density (p.d.f), pgen.betaII() gives
the distribution function (p.d.f), qgen.betaII() gives the quantile
function (Inverse Distribution function), and rgen.betaII() generates
random numbers from the GB2 distribution.
Note
Values of the shape2.p parameter moderately close to zero may imply
obtaning numerical values too close to zero or values represented as zero
in computer arithmetic from the function rgen.betaII().
Additionally, for specific values of the arguments x, q, p such as
Inf, -Inf, NaN and NA, the functions qgen.betaII(),
pgen.betaII() and qgen.betaII() will return the limit when
the argument tends to such value.
In particular, the quantile qgen.betaII() retunrs zero for negative
values and Inf for missed probabilities greater than 1.0.
Author(s)
V. Miranda and T. W. Yee
References
Abramowitz, M. and Stegun, I. A. (1972) Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, ch.6, p.255. Dover, New York, USA.
Kleiber, C. and Kotz, S. (2003) Statistical Size Distributions in Economics and Actuarial Sciences. Wiley Series in Probability and Statistics. Hoboken, New Jersey, USA.
McDonald, J. B. and Xu, Y. J. (1995) A generalization of the beta distribution with applications. Journal of Econometrics, 66, p.133–152.
McDonald, J. B. (1984) Some generalized functions for the size distribution of income. Econometrica, 52, p.647–663.
See Also
Examples
# Setting parameters to both examples below.
b <- exp(0.4) # Scale parameter.
a <- exp(0.5) # Shape1.a
p <- exp(0.3) # Shape2.p
q <- exp(1.4) # Shape3.q
# (1) ______________
probs.y <- seq(0.0, 1.0, by = 0.01)
data.1 <- qgen.betaII(p = probs.y, scale = b, shape1.a = a,
shape2.p = p, shape3.q = q)
max(abs(pgen.betaII(q = data.1, scale = b, shape1.a = a,
shape2.p = p, shape3.q = q)) - probs.y) # Should be 0.
# (2)_________________
xx <- seq(0, 10.0, length = 200)
yy <- dgen.betaII(xx, scale = b, shape1.a = a, shape2.p = p, shape3.q = q)
qtl <- seq(0.1, 0.9, by = 0.1)
d.qtl <- qgen.betaII(qtl, scale = b, shape1.a = a, shape2.p = p, shape3.q = q)
plot(xx, yy, type = "l", col = "red",
main = "Red is the GB2 density, blue is the GB2 Distribution Function",
sub = "Brown dashed lines represent the 10th, ..., 90th percentiles",
las = 1, xlab = "x", ylab = "", xlim = c(0, 3), ylim = c(0,1))
abline(h = 0, col = "navy", lty = 2)
abline(h = 1, col = "navy", lty = 2)
lines(xx, pgen.betaII(xx, scale = b, shape1.a = a,
shape2.p = b, shape3.q = q), col= "blue")
lines(d.qtl, dgen.betaII(d.qtl, scale = b, shape1.a = a,
shape2.p = p, shape3.q = q),
type ="h", col = "brown", lty = 3)
Link functions for the mean of 1–parameter discrete distributions: The Geometric Distribution.
Description
Computes the geometricffMlink transformation, including its inverse
and the first two derivatives.
Usage
geometricffMlink(theta, bvalue = NULL, inverse = FALSE,
deriv = 0, short = TRUE, tag = FALSE)
Arguments
theta |
Numeric or character. See below for further details. |
bvalue, inverse, deriv, short, tag |
Details at |
Details
This is a natural link function to model the mean of the (discret)
geometric distribution, geometric,
defined as the logarithmm of its mean, i.e.,
\eta = -\log \frac{p}{1 - p} = -{\tt{logit}}(p).
Here, p is the probability of succes, as in
geometric.
While this link function can be used to model any parameter lying in
(0, 1), it is particularly useful for event-rate geometric data
where the mean can be written in terms of some rate of events,
say \lambda = \lambda(\mathbf{x}), as
\mu = \lambda(\mathbf{x}) t,
and the time t (as \log t) can be easily
incorporated in the analysis as an offset.
Under this link function the domain set for p
is (0, 1). Hence, values of \rho too
close to the extremes, or out of range will result
in Inf, -Inf, NA or NaN.
Use argument bvalue to adequately replace them before
computing the link function.
If theta is a character, arguments inverse and
deriv are disregarded.
Value
For deriv = 0, the geometricffMlink transformation of
theta when inverse = FALSE. When
inverse = TRUE then theta becomes \eta, and
exp(-theta) / (exp(-theta) - 1) is returned.
For deriv = 1, d eta / d theta,
if inverse = FALSE, else
the reciprocal d theta / d eta
as a function of theta.
For deriv = 2 the second order derivatives
are correspondingly returned.
Warning
Numerical instability may occur if covariates are used leading to
values of p out of range. Try to overcome this by using
argument bvalue.
Note
This function may return Inf of -Inf for values of
p too close to 0 and 1 respectively.
Author(s)
V. Miranda and T. W. Yee
See Also
geometric,
Links,
logitlink,
logffMlink.
Examples
### Example 1 ###
my.probs <- ppoints(100)
geol.inv <-
geometricffMlink(theta = geometricffMlink(theta = my.probs), # the inverse
inverse = TRUE) - my.probs
summary(geol.inv) ## zero
### Example 2. Special values of 'prob' ###
my.probs <- c(-Inf, -2, -1, 0, 0.25, 0.75, 1.0, 5, Inf, NaN, NA)
rbind(probs = my.probs,
geoffMlink = geometricffMlink(theta = my.probs),
inv.geoffl = geometricffMlink(theta = my.probs, inverse = TRUE))
### Example 3 Some probability link functions ###
my.probs <- ppoints(100)
par(lwd = 2)
plot(my.probs, logitlink(my.probs), xlim = c(-0.1, 1.1), ylim = c(-5, 8),
type = "l", col = "limegreen",
ylab = "transformation", las = 1, main = "Some probability link functions")
lines(my.probs, geometricffMlink(my.probs), col = "gray50")
lines(my.probs, logffMlink(my.probs), col = "blue")
lines(my.probs, probitlink(my.probs), col = "purple")
lines(my.probs, clogloglink(my.probs), col = "chocolate")
lines(my.probs, cauchitlink(my.probs), col = "tan")
abline(v = c(0.5, 1), lty = "dashed")
abline(v = 0, h = 0, lty = "dashed")
legend(0.1, 8,
c("geometricffMlink", "logffMlink","logitlink", "probitlink",
"clogloglink", "cauchitlink"),
col = c("gray50", "blue", "limegreen", "purple", "chocolate", "tan"),
lwd = 1, cex = 0.5)
par(lwd = 1)
The Inverse Chi–squared Distribution
Description
Density, CDF, quantile function and random number generator for the Inverse Chi–squared distribution.
Usage
dinv.chisq(x, df, log = FALSE)
pinv.chisq(q, df, lower.tail = TRUE, log.p = FALSE)
qinv.chisq(p, df, lower.tail = TRUE, log.p = FALSE)
rinv.chisq(n, df)
Arguments
x, q, p, n |
Same as |
df, lower.tail, log, log.p |
Same as |
Details
The inverse chi–squared distribution with non–negative
df = \nu degrees
of freedom implemented here has density
f(x; \nu) = \frac{ 2^{-\nu / 2} x^{-\nu/2 - 1}
e^{-1 / (2x)} }{ \Gamma(\nu / 2) },
where x > 0, and
\Gamma is the gamma function.
The mean is 1 / (\nu - 2), for \nu > 2, and the variance is
given by 2 / [(\nu - 2)^2 (\nu - 4)], for \nu > 4.
Also, as with Chisquare, the degrees
of freedom can be non–integer.
Value
dinv.chisq returns the density, pinv.chisq returns the
distribution function, qinv.chisq gives the quantiles, and
rinv.chisq generates random numbers from this distribution.
Source
Specifically, it is the probability distribution of a random variable whose reciprocal follows a chi–squared distribution, i.e.,
If Y \sim chi–squared(\nu), then 1/Y \sim
Inverse chi–squared(\nu).
As a result, dinv.chisq, pinv.chisq, qinv.chisq and
rinv.chisq use the functions
Chisquare as a basis. Hence,
invalid arguments will lead these functions to return
NA or NaN accordingly.
Note
Yet to do: A non–central parameter as an argument, if amenable.
Two similar versions of the Inverse chi–squared distribution with
\nu degrees of freedom may be found in the literature, as
follows:
Let Y \sim chi–squared(\nu), then
I. 1 / Y \sim Inverse chi–squared(\nu), and
II. \nu / Y \sim Inverse chi–squared(\nu).
Here, the former, which is the popular version, has been implemented.
Author(s)
V. Miranda
References
Johnson, N.L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions. Chapters 18 (volume 1) and 29 (volume 2). Wiley, New York.
See Also
Examples
## Example 1 ##
nn <- 50; df <- 1.4
data.1 <- ppoints(nn)
data.q <- qinv.chisq(-data.1, df = df, log.p = TRUE)
data.p <- -log(pinv.chisq(data.q, df = df))
max(abs(data.p - data.1)) # Should be zero
## Example 2 ##
xx <- seq(0, 3.0, len = 301)
yy <- dinv.chisq(xx, df = df)
qtl <- seq(0.1, 0.9, by = 0.1)
d.qtl <- qinv.chisq(qtl, df = df)
plot(xx, yy, type = "l", col = "orange",
main = "Orange is density, blue is cumulative distribution function",
sub = "Brown dashed lines represent the 10th, ... 90th percentiles",
las = 1, xlab = "x", ylab = "", ylim = c(0, 1))
abline(h = 0, col= "navy", lty = 2)
lines(xx, pinv.chisq(xx, df = df), col = "blue")
lines(d.qtl, dinv.chisq(d.qtl, df = df), type ="h", col = "brown", lty = 3)
Link functions for the mean of 1–parameter continuous distributions: The inverse chi–squared distribution.
Description
Computes the inv.chisqMlink transformation, its inverse and
the first two derivatives.
Usage
inv.chisqMlink(theta, bvalue = NULL, inverse = FALSE,
deriv = 0, short = TRUE, tag = FALSE)
Arguments
theta |
Numeric or character. This is |
bvalue, inverse, deriv, short, tag |
See |
Details
This link functions models the mean of the
inverse chi–squared distribution,
inv.chisqff.
It is defined as
\eta = -\log ( df - 2),
where df denotes the (non–negative) degrees of freedom, as in
inv.chisqff.
Notice, however, that df > 2 is required for the mean of
this distribution to be real. Consequently, the domain set for
df for this link function is (2, \infty).
Numerical values of df out of range will
result in NA or NaN.
Value
For deriv = 0, the inv.chisqMlink transformation of
theta when inverse = FALSE.
If inverse = TRUE, then the inverse exp(-theta) + 2.
For deriv = 1,
d eta / d theta when inverse = FALSE.
If inverse = TRUE, then
d theta / d eta as a function of
theta.
When deriv = 2, the second derivatives in
terms of theta are returned.
Note
Numerical instability may occur for values theta too large or
possibly, too close to 2. Use argument bvalue to replace them
before computing the link.
If theta is character, then arguments inverse and
deriv are ignored. See Links
for further details.
Author(s)
V. Miranda and Thomas W. Yee.
See Also
Examples
## E1. Modelling the mean of the exponential distribution ##
set.seed(17010502)
dof <- 2.5
isq.data <- data.frame(x2 = runif(100, 0, 1))
isq.data <- transform(isq.data, y = rinv.chisq(n = 100, df = dof + x2))
hist(isq.data$y)
fit.inv <- vglm(y ~ x2, family = inv.chisqff(link = "inv.chisqMlink"),
data = isq.data, trace = TRUE )
coef(fit.inv, matrix = TRUE)
summary(fit.inv)
## E3. Special values in a matrix ##
(theta <- matrix(c(Inf, -Inf, NA, NaN, 1 , 2, 3, 4), ncol = 4, nrow = 2))
inv.chisqMlink(theta = theta) ## NaNs for df = theta <= 2
## E2. inv.chisqMlink() and its inverse ##
theta <- 0.1 + 1:5 # dof = df
my.diff <- theta - inv.chisqMlink(inv.chisqMlink(theta = theta), inverse =TRUE)
summary(my.diff) # Zero
Inverse Chi–squared Distribution.
Description
Maximum likelihood estimation of the degrees of freedom for an inverse chi–squared distribution using Fisher scoring.
Usage
inv.chisqff(link = "loglink", zero = NULL)
Arguments
link, zero |
For further details, see
|
Details
The inverse chi–squared distribution
with df = \nu \geq 0 degrees of
freedom implemented here has density
f(x; \nu) = \frac{ 2^{-\nu / 2} x^{-\nu/2 - 1}
e^{-1 / (2x)} }{ \Gamma(\nu / 2) },
where x > 0, and
\Gamma is the gamma function.
The mean of Y is 1 / (\nu - 2) (returned as the fitted
values), provided \nu > 2.
That is, while the expected information matrices used here are
valid in all regions of the parameter space, the regularity conditions
for maximum likelihood estimation are satisfied only if \nu > 2.
To enforce this condition, choose
link = logoff(offset = -2).
As with, chisq, the degrees of freedom are
treated as a parameter to be estimated using (by default) the
link loglink. However, the mean can also
be modelled with this family function.
See inv.chisqMlink
for specific details about this.
This family VGAM function handles multiple responses.
Value
An object of class "vglmff".
See vglmff-class for further details.
Warning
By default, the single linear/additive predictor in this family
function, say \eta = \log dof,
can be modeled in terms of covariates,
i.e., zero = NULL.
To model \eta as intercept–only set zero = "dof".
See zero for more details about this.
Note
As with chisq or
Chisquare, the degrees of freedom are
non–negative but allowed to be non–integer.
Author(s)
V. Miranda.
See Also
loglink,
CommonVGAMffArguments,
inv.chisqMlink,
zero.
Examples
set.seed(17010504)
dof <- 2.5
yy <- rinv.chisq(100, df = dof)
ics.d <- data.frame(y = yy) # The data.
fit.inv <- vglm(cbind(y, y) ~ 1, inv.chisqff,
data = ics.d, trace = TRUE, crit = "coef")
Coef(fit.inv)
summary(fit.inv)
2 - parameter Inverse Gamma Distribution
Description
Estimates the 2-parameter Inverse Gamma distribution by maximum likelihood estimation.
Usage
invgamma2mr(lmu = "loglink",
lshape = logofflink(offset = -2),
parallel = FALSE,
ishape = NULL,
imethod = 1,
zero = "shape")
Arguments
lmu, lshape |
Link functions applied to the (positives) mu and shape
parameters (called |
parallel |
Same as |
ishape |
Optional initial value for shape, same as
|
imethod |
Same as |
zero |
Numeric or character vector. Position or name(s) of the
parameters/linear predictors to be
modeled as intercept–only. Default is |
Details
The Gamma distribution and the Inverse Gamma distribution are related
as follows:Let X be a random variable distributed as
Gamma (a, \beta), where a > 0
denotes the shape parameter and \beta > 0 is the
scale paramater. Then Y = 1/X is an Inverse Gamma
random variable with parameters scale = a and
shape = 1/\beta.
The Inverse Gamma density function is given by
f(y;\mu, a) = \frac{(a - 1)^{a} \mu^{a}}{\Gamma(a)}y^{-a- 1} \
e^{-\mu(a - 1)/y},
for \mu > 0, a > 0 and y > 0.
Here, \Gamma(\cdot) is the gamma function, as in
gamma. The mean of Y is
\mu=\mu (returned as the fitted values) with variance
\sigma^2 = \mu^2 / (a - 2)
if a > 2, else is infinite. Thus, the
link function for the shape parameter is
logloglink. Then, by default, the two
linear/additive predictors are \eta_1=\log(\mu),
and \eta_2=\log(a), i.e in the VGLM context,
\eta = (log(\mu), loglog(a)
This VGAM family function handles multiple reponses by
implementing Fisher scoring and unlike
gamma2, the working-weight matrices
are not diagonal.
The Inverse Gamma distribution is right-skewed and either for small values
of a (plus modest \mu) or very large values of
\mu (plus moderate a > 2), the density has
values too close to zero.
Value
An object of class "vglmff"
(see vglmff-class).
The object is used by modelling functions
such as vglm
and vgam.
Warning
Note that zero can be a numeric or a character
vector specifying the position of the names
(partially or not) of the linear predictor modeled as intercept only.
In this family function such names are
c("mu", "shape").
Numeric values can be entered as usual.
See CommonVGAMffArguments
for further details.
Note
The response must be strictly positive.
If mu and shape are vectors, then rinvgamma(n = n,
shape = shape, scale = mu/(shape - 1) will generate random inverse gamma
variates of this parameterization, etc.;
see invgammaDist.
Given the math relation between the Gamma and the Inverse Gamma distributions, the parameterization of this VGAM family function underlies on the parametrization of the 2-parameter gamma distribution described in the monograph
Author(s)
Victor Miranda and T. W. Yee
References
McCullagh, P. and Nelder, J. A. (1989) Generalized Linear Models, 2nd ed. London, UK. Chapman & Hall.
See Also
invgammaDist,
gamma2 for the 2-parameter gamma distribution,
GammaDist,
CommonVGAMffArguments,
Examples
#------------------------------------------------------------------------#
# Essentially fitting a 2-parameter inverse gamma distribution
# with 2 responses.
set.seed(101)
y1 = rinvgamma(n = 500, scale = exp(2.0), shape = exp(2.0))
y2 = rinvgamma(n = 500, scale = exp(2.5), shape = exp(2.5))
gdata <- data.frame(y1, y2)
fit1 <- vglm(cbind(y1, y2) ~ 1,
family = invgamma2mr(zero = NULL,
# OPTIONAL INITIAL VALUE
# ishape = exp(2),
imethod = 1),
data = gdata, trace = TRUE)
Coef(fit1)
c(Coef(fit1), log(mean(gdata$y1)), log(mean(gdata$y2)))
summary(fit1)
vcov(fit1, untransform = TRUE)
#------------------------------------------------------------------------#
# An example including one covariate.
# Note that the x2 affects the shape parameter, which implies that both,
# 'mu' and 'shape' are affected.
# Consequently, zero must be set as NULL !
x2 <- runif(1000)
gdata <- data.frame(y3 = rinvgamma(n = 1000,
scale = exp(2.0),
shape = exp(2.0 + x2)))
fit2 <- vglm(y3 ~ x2,
family = invgamma2mr(lshape = "loglink", zero = NULL),
data = gdata, trace = TRUE)
coef(fit2, matrix = TRUE)
summary(fit2)
vcov(fit2)
The Inverse Gamma Distribution
Description
Density, distribution function, quantile function and random numbers generator for the Inverse Gamma Distribution.
Usage
dinvgamma(x, scale = 1/rate, shape, rate = 1, log = FALSE)
pinvgamma(q, scale = 1/rate, shape, rate = 1, lower.tail = TRUE, log.p = FALSE)
qinvgamma(p, scale = 1/rate, shape, rate = 1, lower.tail = TRUE, log.p = FALSE)
rinvgamma(n, scale = 1/rate, shape, rate = 1)
Arguments
x, q, p, n |
Same as |
scale, shape |
Scale and shape parameters, same as
|
rate |
Same as |
log, log.p, lower.tail |
Same as |
Details
The Inverse Gamma density with parameters scale = b and
shape = s is given by
f(y) = \frac{b^{s}}{\Gamma(s)} y^{-s-1} e^{-b/y},
for y > 0, b > 0, and s > 0.
Here, \Gamma(\cdot) is the gamma function as in
gamma
The relation between the Gamma Distribution and the Inverse Gamma Distribution is as follows:
Let X be a random variable distributed as Gamma
(b, s), then Y = 1 / X is distributed as
Inverse Gamma (1/b, s).
It is worth noting that the math relation between the scale paramaters
of both, the Inverse Gamma and Gamma distributions, is inverse.
Thus, algorithms of dinvgamma(), pinvgamma(), qinvgamma() and
rinvgamma() underlie on the algorithms
GammaDist.
Let Y distributed as Inverse Gamma (b, s). Then the
k^{th} moment of Y exists for -\infty < k < s and is
given by
E[Y^k] = \frac{\Gamma(s - k)}{\Gamma(s)} b^k.
The mean (if s > 1) and variance (if s > 2) are
E[Y] = \frac{b}{(s - 1)}; \ \ \ Var[Y] = \frac{b^2}{(s - 1)^2
\times (s - 2)}.
Value
dinvgamma() returns the density, pinvgamma() gives the
distribution function, qinvgamma() gives the quantiles, and
rinvgamma() generates random deviates.
Warning
The order of the arguments scale and shape does not match
GammaDist.
Note
Unlike the GammaDist, small values
of a (plus modest \mu) or very large values of
\mu (plus moderate a > 2), generate Inverse Gamma
values so near to zero. Thus, rinvgamma in
invgammaDist may return either values
too close to zero or values represented as zero in computer arithmetic.
In addition, function dinvgamma will return zero for
x = 0, which is the limit of the Inverse Gamma density when
'x' tends to zero.
Author(s)
V. Miranda and T. W. Yee.
References
Kleiber, C. and Kotz, S. (2003) Statistical Size Distributions in Economics and Actuarial Sciences. Wiley Series in Probability and Statistics. Hoboken, New Jersey, USA.
See Also
Examples
# Example 1.______________________
n <- 20
scale <- exp(2)
shape <- exp(1)
data.1 <- runif(n, 0, 1)
data.q <- qinvgamma(-data.1, scale = scale, shape = shape, log.p = TRUE)
data.p <- -log(pinvgamma(data.q, scale = scale, shape = shape))
arg.max <- max(abs(data.p - data.1)) # Should be zero
# Example 2.______________________
scale <- exp(1.0)
shape <- exp(1.2)
xx <- seq(0, 3.0, len = 201)
yy <- dinvgamma(xx, scale = scale, shape = shape)
qtl <- seq(0.1, 0.9, by = 0.1)
d.qtl <- qinvgamma(qtl, scale = scale, shape = shape)
plot(xx, yy, type = "l", col = "orange",
main = "Orange is density, blue is cumulative distribution function",
sub = "Brown dashed lines represent the 10th, ... 90th percentiles",
las = 1, xlab = "x", ylab = "", ylim = c(0, 1))
abline(h = 0, col= "navy", lty = 2)
lines(xx, pinvgamma(xx, scale = scale, shape = shape), col = "blue")
lines(d.qtl, dinvgamma(d.qtl, scale = scale, shape = shape),
type ="h", col = "brown", lty = 3)
2- parameter Inverse Weibull Distribution
Description
Maximum likelihood estimation of the 2-parameter Inverse Weibull distribution. No observations should be censored.
Usage
invweibull2mr(lscale = loglink,
lshape = logofflink(offset = -2),
iscale = NULL,
ishape = NULL,
imethod = 2,
lss = TRUE,
gscale = exp(-4:4),
gshape = exp(-4:4),
probs.y = c(0.25, 0.50, 0.75),
zero = "shape")
Arguments
lscale, lshape |
Parameter link functions applied to the
(positive) |
iscale, ishape |
Optional initial values for the |
gscale, gshape |
|
lss, probs.y |
Details at
|
imethod |
Initializing method internally implemented. Currently only the values 1 and 2 are allowed and NO observations should be censored. |
zero |
Numeric or character vector. The position(s) of the name(s) of
the parameters/linear predictors to be
modeled as intercept–only. Default is |
Details
The Weibull distribution and the Inverse Weibull distributions are related as follows:
Let X be a Weibull random variable with paramaters scale =b
and shape =a. Then, the random variable Y = 1/X has
the Inverse Weibull density with parameters scale = 1/b and
shape = a.
The Inverse weibull density for a response Y is given by
f(y;a,b) = a (b^a) y^{-a-1} \exp[-(y/b)^(-a)]
for a > 0, b > 0, y > 0.
The mean, that is returned as the fitted values, (if a > 1) and
the variance (if a > 2) are
E[Y] = b \ \Gamma(1 - 1/a); \ \ \ Var[Y] = b^{2} \
[\Gamma(1 - 2/a) - (\Gamma(1 - 1/a))^2].
Fisher scoring is used to estimate both parameters.
Although the expected information matrices used are valid in all regions of
the parameter space, the regularity conditions for maximum likelihood
estimation (MLE) are satisfied only if a>2
(according to Kleiber and Kotz (2003)). If this is violated then a warning
message is issued. To enforce a > 2, it has been set by default
that lshape = logofflink(offset = 2).
As a result of the math relation between the Weibull and the Inverse Weibull
distributions, regularity conditions for inference for the latter, are the
following:
if a \le 1 then the MLE's are not consisten,
if 1 < a < 2 then MLEs exist but are not assymptotically
normal, if a = 2, the MLE's exist and are normal and
asymptotically efficient but the convergence rate is slower compared when
a > 2. If a > 2, then the MLE's have classical
asymptotic properties.
Value
An object of class "vglmff"
(see vglmff-class).
The object is used to model special models such as
vglm and vgam.
Warning
Note that zero can be a numeric or a character
vector specifying the position of the names
(partially or not) of the linear predictor modeled as intercept only.
In this family function these names are
c("scale", "shape").
Numeric values can be entered as usual.
See CommonVGAMffArguments
for further details.
For simplicity, the second choice is recommended.
If the shape parameter is less than two (i.e. less than
exp(0.69315)), then misleading inference may
result ! (see above the regularity condition for the 'variance'),
e.g., in the summary and vcov of the object.
However, the larger the shape parameter is (for instance, greater than
exp(2.5), plus reasonable scale), the more unstable the algorithm
may become. The reason is that inverse weibull densities under such
conditions are highly peaked and left skewed. Thus, density values are
too close to zero (or values represented as zero in computer arithmetic).
Note
By default, the shape paramater is modeled as intercept only.
Successful convergence depends on having reasonably good initial
values. If the initial values chosen by this function are not good,
make use the two initial value arguments, iscale and ishape.
This VGAM family function currently handles multiple responses however, it does not handle censored data. This feature will be considered in a later version of the package.
The Inverse Weibull distribution, which is that of Y = 1/X
where X has the Weibull density, is known as the
log-Gompertz distribution.
Author(s)
Victor Miranda and T. W. Yee.
References
Harper, W. V., Eschenbach, T. G. and James, T. R. (2011) Concerns about Maximum Likelihood Estimation for the Three-Parameter Weibull Distribution: Case Study of Statistical Software. The American Statistician, 65(1), 44-54.
Kleiber, C. and Kotz, S. (2003) Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.
Johnson, N. L. and Kotz, S. and Balakrishnan, N. (1994) Continuous Univariate Distributions, 2nd edition, Volume 1, New York: Wiley.
See Also
Examples
#-----------------------------------------------------------------------#
# Here, covariate 'x2' affects the scale parameter.
# See how data is generated.
set.seed(102)
wdata <- data.frame(x2 = runif(nn <- 1000)) # Complete data
wdata <- transform(wdata,
y1 = rinvweibull(nn, scale = exp(2.5 - (0.5) * x2),
shape = exp(1.5) ),
y2 = rinvweibull(nn, scale = exp(1.5 + x2),
shape = exp(1.25) ))
#------------------------------------------------------------------------#
# Fitting the Inverse Weibull distribution accordingly.
# Note that multiple responses are handled.
fit1 <- vglm(cbind(y1, y2) ~ x2,
invweibull2mr(zero = "shape",
# OPTIONAL INITIAL VALUE. Be carefull here when
# entered initial value. Sensitive distribution
ishape = exp(1.2),
lss = TRUE),
data = wdata, trace = TRUE, crit = "log")
coef(fit1, matrix = TRUE)
vcov(fit1)
summary(fit1)
### A second option (producing same results!!) might be to use the
### constraints argument in the 'vglm()' call. Note that 'x2' affects
### the scale parameter only.
fit2 <- vglm(y1 ~ x2,
invweibull2mr(zero = NULL),
data = wdata, trace = TRUE,
constraints = list(x2 = rbind(1, 0)))
coef(fit2, matrix = TRUE)
vcov(fit2)
summary(fit2)
constraints(fit2)
The Inverse Weibull Distribution
Description
Density, distribution function, quantile function and random numbers generator for the Inverse Weibull Distribution.
Usage
dinvweibull(x, scale = 1, shape, log = FALSE)
pinvweibull(q, scale = 1, shape, lower.tail = TRUE, log.p = FALSE)
qinvweibull(p, scale = 1, shape, lower.tail = TRUE, log.p = FALSE)
rinvweibull(n, scale = 1, shape)
Arguments
x, q, p, n |
Same as |
scale, shape |
Scale and shape parameters, same as |
log, log.p, lower.tail |
Same as |
Details
The Inverse Weibull density with parameters scale = b and
shape = s, is
f(y) = s b^s y^{-s-1} \exp{[-(y/b)^{-s}}],
for y > 0, b > 0, and s > 0.
The Weibull distribution and the Inverse Weibull distributions are related as follows:
Let X be a Weibull random variable with paramaters scale =b
and shape =s. Then, the random variable Y = 1/X has
the Inverse Weibull density with parameters scale = 1/b and
shape = s. Thus, algorithms of [dpqr]-Inverse Weibull
underlie on Weibull.
Let Y be a r.v. distributed as Inverse Weibull (b, s).
The k^{th} moment exists for
-\infty < k < s and is given by
E[Y^k] = b^{k} \ \Gamma(1 - k/s).
The mean (if s > 1) and variance (if s > 2) are
E[Y] = b \ \Gamma(1 - 1/s); \ \ \ Var[Y] = b^{2} \
[\Gamma(1 - 2/s) - (\Gamma(1 - 1/s))^2].
Here, \Gamma(\cdot) is the gamma function as in
gamma.
Value
dinvweibull() returns the density, pinvweibull() computes the
distribution function, qinvweibull() gives the quantiles, and
rinvweibull() generates random numbers from the Inverse Weibull
distribution.
Warning
The order of the arguments of [dpqr]-Inverse Weibull does not match
those in Weibull.
Note
Small values of scale or shape will provide Inverse Weibull
values too close to zero. Then, function rinvweibull() with such
characteristics will return either values too close to zero or values
represented as zero in computer arithmetic.
The Inverse Weibull distribution, which is that of X where
1/X has the Weibull density, is known as the log-Gompertz
distribution. Thus, in order to emphazise the continuity concept of the
Inverse Weibull density, if x = 0, then dinvweibull
returns zero, which is the limit of such a density when 'x'
tends to zero.
Author(s)
V. Miranda and T. W. Yee.
References
Kleiber, C. and Kotz, S. (2003) Statistical Size Distributions in Economics and Actuarial Sciences. Wiley Series in Probability and Statistics. Hoboken, New Jersey, USA.
Abramowitz, M. and Stegun, I. A. (1972) Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. ch.6, p.255. Dover, New York, USA.
See Also
Examples
#(1) ______________
n <- 20
scale <- exp(2)
shape <- exp(1)
data.1 <- runif(n, 0, 1)
data.q <- qinvweibull(-data.1, scale = scale, shape = shape, log.p = TRUE)
data.p <- -log(pinvweibull(data.q, scale = scale, shape = shape))
arg.max <- max(abs(data.p - data.1)) # Should be zero
#(2)_________________
scale <- exp(1.0)
shape <- exp(1.2)
xx <- seq(0, 10.0, len = 201)
yy <- dinvweibull(xx, scale = scale, shape = shape)
qtl <- seq(0.1, 0.9, by =0.1)
d.qtl <- qinvweibull(qtl, scale = scale, shape = shape)
plot(xx, yy, type = "l", col = "red",
main = "Red is density, blue is cumulative distribution function",
sub = "Brown dashed lines represent the 10th, ... 90th percentiles",
las = 1, xlab = "x", ylab = "", ylim = c(0,1))
abline(h = 0, col= "navy", lty = 2)
lines(xx, pinvweibull(xx, scale = scale, shape = shape), col= "blue")
lines(d.qtl, dinvweibull(d.qtl, scale = scale, shape = shape),
type ="h", col = "brown", lty = 3)
Link functions for the mean of 1–parameter discrete distributions: The Logarithmic Distribuion.
Description
Computes the logffMlink transformation, including its inverse
and the first two derivatives.
Usage
logffMlink(theta, bvalue = NULL,
alg.roots = c("Newton-Raphson", "bisection")[1],
inverse = FALSE, deriv = 0, short = TRUE, tag = FALSE)
Arguments
theta |
Numeric or character. This is |
bvalue |
This is a boundary value. See below.
Also refer to |
alg.roots |
Character. The iterative algorithm to find the inverse of this link function. Default is the first (Newton–Raphson). Optionally, the bisection method is also available. See below for more details. |
inverse, deriv, short, tag |
Details at |
Details
This link function arises as a natural link function for the mean,
\mu, of the logarithmic (or log-series) distribution,
logff. It is defined for any value
of the shape parameter s (i.e. theta in the
VGLM/VGAM context), 0 < s < 1, as the
logarithm of \mu = \mu(s). It can be easily shown that
logffMlink is the difference of two common link functions:
logitlink and
clogloglink.
It is particularly usefull for event–rate data where the expected number of events can be modelled as
\mu = \mu(s) = \lambda t.
Here \lambda is the standardized mean (or event-rate)
per unit time, t is the timeframe observed,
whereas \mu and s are the mean and the
shape parameter of the logarithmic distribution respectively.
The logarithm is then applied to both sides so
that t can be incorporated in the analysis as an offset.
While logffMlink is not the canonical link function of
the logarithmic distribution, it is certainly part of the
canonical link, given by the composite
\log \circ~(g^{-1}) \circ \log,
where g^{-1} denotes the inverse of
logffMlink.
The domain set of this link function is (0, 1).
Therefore, values of theta (that is s) too close to
0 or to 1 or out of range will result in Inf,
-Inf, NA or NaN. Use argument bvalue
to adequately replace them before computing the link function.
Particularly, if inverse = TRUE and deriv = 0,
then s becomes \eta, and therefore the domain set
turns to (0, \infty).
If theta is a character, then arguments inverse and
deriv are disregarded.
Value
For deriv = 0, the logffMlink transformation of
theta, i.e., logitlink(theta) - clogloglink(theta), if
inverse = FALSE.
When inverse = TRUE the vector entered at theta
becomes \eta and, then, this link function
returns a unique vector \theta_{\eta} such that
{\tt{logffMlink}} (\theta_{\eta}) = \eta,
i.e., the inverse image of \eta.
Specifically, the inverse of logffMlink cannot be written in
closed–form, then the latter is equivalent to search for the roots
of the function
{\tt{logff.func}}(\theta) = {\tt{logffMlink}}(\theta) - \eta
as a function of \theta. To do this, the auxiliary
function logff.func is internally generated.
Then, with the method established at alg.roots,
either Newton–Raphson or bisection, this link function
approximates and returns the inverse image
\theta_{\eta} (of given \eta), which
plays the role of the inverse of logffMlink. In particular,
for \eta = 0 and \eta =Inf, it returns 0
and 1 respectively.
For deriv = 1, d eta / d theta
as a function of theta if inverse = FALSE, else
the reciprocal d theta / d eta.
Similarly, when deriv = 2 the second order derivatives
are correspondingly returned.
Both, first and second derivatives, can be written in closed–form.
Warning
logffMlink is a monotonically
increasing, convex, and strictly positive function in (0, 1) such
that the horizontal axis is an asymptote. Therefore, when the inverse
image of \eta is required, each entry of \eta
(via argument theta) must be non-negative so that
{\tt{logff.func}(\theta; \eta) = \tt{logffMlink}}(\theta) - \eta
is shifted down. This fact allows this function to uniquely
intersect the horizontal axis which guarantees to iteratively find
the corresponding root \theta_{\eta}, i.e., the
inverse image of \eta. Else, NaN will be returned.
See example 3. It is the plot of logffMlink in (0, 1)
for \eta = 1.5.
Besides, the vertical straight line theta = 1 is also
an asymptote. Hence, this link function may grow sharply for
values of theta too close to 1.
See Example 4 for further details.
Note
To find the inverse image \theta_{\eta} of
\eta, either
newtonRaphson.basic
or bisection.basic is
called.
This link function can be used for modelling any parameter lying
between 0.0 and 1.0. Consequently, when there are covariates,
some problems may occur. For example, the method entered
at alg.roots to approximate the inverse image may converge
at a slow rate. Similarly if the sample size is small, less than 20 say.
Try another link function, as
logitlink, in such cases.
Author(s)
V. Miranda and T. W. Yee
See Also
logff,
newtonRaphson.basic,
bisection.basic,
Links,
clogloglink,
logitlink.
Examples
## Example 1 ##
set.seed(0906)
Shapes <- sort(runif(10))
logffMlink(theta = Shapes, deriv = 1) ## d eta/d theta, as function of theta
logldata.inv <-
logffMlink(theta = logffMlink(theta = Shapes), inverse = TRUE) - Shapes
summary(logldata.inv) ## Should be zero
## Example 2 Some probability link funtions ##
s.shapes <- ppoints(100)
par(lwd = 2)
plot(s.shapes, logitlink(s.shapes), xlim = c(-0.1, 1.1), type = "l", col = "limegreen",
ylab = "transformation", las = 1, main = "Some probability link functions")
lines(s.shapes, logffMlink(s.shapes), col = "blue")
lines(s.shapes, probitlink(s.shapes), col = "purple")
lines(s.shapes, clogloglink(s.shapes), col = "chocolate")
lines(s.shapes, cauchitlink(s.shapes), col = "tan")
abline(v = c(0.5, 1), lty = "dashed")
abline(v = 0, h = 0, lty = "dashed")
legend(0.1, 4.5, c("logffMlink","logitlink", "probitlink", "clogloglink",
"cauchitlink"),
col = c("blue", "limegreen", "purple", "chocolate", "tan"), lwd = 1)
par(lwd = 1)
## Example 3. Plot of 'logffMlink()' with eta = 1.5. ##
m.eta1.5 <- logffMlink(theta = s.shapes, deriv = 0) - 1.5
plot(m.eta1.5 ~ s.shapes, type = "l", col = "limegreen",
las = 1, lty = 2, lwd = 3, xlim = c(-0.1, 1.0), ylim = c(-2, 3),
xlab = "shape parameter, s, in (0, 1).",
ylab = "logffMlink(s) - 1.5",
main = "logff.func(s; 1.5) = logffMlink(s) - 1.5, in (0, 1)")
abline(h = 0, v = 0)
abline(v = 1.0, lty = 2)
## Example 4. Special values of theta, inverse = FALSE ##
s.shapes <- c(-Inf, -2, -1, 0.0, 0.25, 0.5, 1, 10, 100, Inf, NaN, NA)
rbind(s.shapes, logffMlink(theta = s.shapes))
Link functions for the quantiles of several 1–parameter continuous distributions
Description
Computes the maxwellQlink transformation, its inverse and the
first two derivatives.
Usage
maxwellQlink(theta, p = stop("Argument 'p' must be specified."),
bvalue = NULL, inverse = FALSE,
deriv = 0, short = TRUE, tag = FALSE)
Arguments
theta |
Numeric or character. See below for further details. |
p |
Numeric. A single value between 0 and 1.
It is the |
bvalue, inverse, deriv, short, tag |
See |
Details
In the VGLM/VGAM quantile regression context, this link function
can be used to model any p–quantile of the
Maxwell distribition. It is the maxwellQlink transformation
given by
\sqrt{ 2~\textrm{\texttt{qgamma(p, 1.5) / a.}}}
Here, a is positive parameter as in
maxwell
whereas qgamma is the quantile
function of the gamma distribution.
Numerical values of a or p out of range will
result in Inf, -Inf, NA or NaN
correspondingly.
In particular, arguments inverse and deriv are
disregarded if theta is character.
Also, if inverse = TRUE and deriv = 0, then argument
theta becomes \eta. See Links
for further details about this.
Value
For deriv = 0, the maxwellQlink transformation of
theta, when inverse = FALSE. If inverse = TRUE,
then the inverse given by 2*qgamma(p, 1.5) / theta^2 is returned.
For deriv = 1, this function returns the derivative
d eta / d theta,
if inverse = FALSE. Else, the reciprocal
d theta / d eta as a function of theta.
If deriv = 2, then the second order derivatives in terms of
theta are accordingly returned.
Note
Numerical instability may occur for values theta too close
to zero. Use argument bvalue to replace them before
computing the link.
Author(s)
V. Miranda and Thomas W. Yee.
See Also
Examples
## E1. maxwellQlink() and its inverse ##
p <- 0.25 ## Modeling the first quartile
my.a <- seq(0, 5, by = 0.1)[-1]
max(my.a - maxwellQlink(maxwellQlink(my.a, p = p), p = p, inverse =TRUE)) ## Zero
## E2. The first two values are negative, NaN is returned ##
maxwellQlink(theta = c(-0.15, -0.10, 0.25, 0.35) , p = p, inverse = FALSE)
maxwellQlink(theta = c(-0.15, -0.10, 0.25, 0.35) , p = p, inverse = TRUE)
## E3. Plot of the maxwellQlink() and its inverse ##
## Note, inverse = TRUE implies that argument 'theta' becomes 'eta'. ##
#--- THE LINK
plot(maxwellQlink(theta = my.a, p = p) ~ my.a,
type = "l", col = "blue", lty = "dotted", lwd = 3,
xlim = c(-0.1, 10), ylim = c(-0.1, 5), las = 1,
main = c("Blue is maxwellQlink(), green is the inverse"),
ylab = "eta = maxwellQlink", xlab = "theta")
abline(h = 0, v = 0, lwd = 2)
#--- THE INVERSE
lines(my.a, maxwellQlink(theta = my.a, p = p,inv = TRUE),
col = "green", lwd = 2, lty = "dashed")
lines(my.a, my.a) # Tracing the identity function for double--check
Newton–Raphson algorithm
Description
Newton–Raphson algorithm to approximate the roots of univariate real–valued functions.
This function is vectorized.
Usage
newtonRaphson.basic(f, fprime, a, b,
tol = 1e-8, n.Seq = 20,
nmax = 15, ...)
Arguments
f |
A univariate function whose root(s) are approximated. This is the target function. Must return a vector. |
fprime |
A function. The first derivative of |
a, b |
Numeric vectors.
Upper and lower real limits of the open interval These vectors are subject to be recycled if |
tol |
Numeric. A number close to zero to test whether the
approximate roots from iterations |
n.Seq |
Numeric. The number of equally spaced initial points within
the interval ( |
nmax |
Maximum number of iterations. Default is |
... |
Any other argument passed down to functions |
Details
This is an implementation of the well–known Newton–Raphson
algorithm to find a real root, r, a < r < b,
of the function f.
Initial values, r_0 say, for the algorithm are
internally computed by drawing 'n.Seq' equally spaced points
in (a, b). Then, the function f is evaluated at this
sequence. Finally, r_0 results from the closest image to
the horizontal axis.
At iteration k, the (k + 1)^{th} approximation
given by
r^{(k + 1)} = r^{(k)} -
{\tt{f}}(r^{(k), ...)} / {\tt{fprime}}(r^{(k)}, ...)
is computed, unless the approximate root from step k is the
desired one.
newtonRaphson.basic approximates this root up to
a relative error less than tol. That is, at each iteration,
the relative error between the estimated roots from iterations
k and k + 1 is calculated and then compared to tol.
The algorithm stops when this condition is met.
Instead of being single real values, arguments a and b
can be entered as vectors of length n, say
{\tt{a}} = c(a_1, a_2, \ldots, a_n) and
{\tt{b}} = c(b_1, b_2,\ldots, b_n).
In such cases, this function approaches the (supposed) root(s)
at each interval (a_j, b_j),
j = 1, \ldots, n. Here, initial values are searched
for each interval (a_j, b_j).
Value
The approximate roots in the intervals
(a_j, b_j).
When j = 1, then a single estimated root is returned, if any.
Note
The explicit forms of the target function f and its
first derivative fprime must be available for the algorithm.
newtonRaphson.basic
does not handle yet numerically approximated derivatives.
A warning is displayed if no roots are found, or if more than one
root might be lying in
(a_j, b_j), for any j = 1, \ldots, n.
If a and b lengths differ, then the recyling rule
is applied. Specifically, the vector with minimum length
will be extended up to match the maximum length by repeating
its values.
Author(s)
V. Miranda.
See Also
Examples
# Find the roots in c(-0.5, 0.8), c(0.6, 1.2) and c(1.3, 4.1) for the
# f(x) = x * (x - 1) * (x - 2). Roots: r1 = 0, and r2 = 1, r3 = 2.
f <- function(x) x * (x - 1) * (x - 2)
fprime <- function(x) 3 * x^2 - 6 * x + 2
# Three roots.
newtonRaphson.basic(f = f, fprime = fprime,
a = c(-0.5, 0.6, 1.3),
b = c(0.8, 1.2, 4.1)) ## 0.0, 1.0 and 2.0
# Recycling rule. Intervals analysed are (-0.5, 1.2) and (0.6, 1.2)
newtonRaphson.basic(f = f, fprime = fprime,
a = c(-0.5, 0.6), b = c(1.2))
## Warning: There is more than one root in (-0.5, 1.2)!
Link functions for the quantiles of several 1–parameter continuous distributions.
Description
Computes the normal1sdQlink transformation for the Univariate
Normal Distribution, its inverse and the first two derivatives.
Usage
normal1sdQlink(theta, mean = stop("Please, enter the fixed 'mean'."),
p = stop(" Please, enter argument 'p'."),
bvalue = NULL, inverse = FALSE, deriv = 0,
short = TRUE, tag = FALSE)
Arguments
theta |
Numeric or character.
This is |
mean |
A numeric vector or a matrix. It is the (known) fixed mean of the Normal distribution of interest. See below for further details. |
p |
Numeric vector of |
bvalue, inverse, deriv, short, tag |
Details at |
Details
This link function is necessarily required by
normal1sd if quantile regression
is to be performed. It computes the normal1sdQlink
transformation, defined as
{\tt{mean}} + \sqrt{2}\sigma \cdot
{\tt{erf^{-1}}}(2 {\tt{p}} - 1).
Here, {\tt{erf^{-1}}} denotes the inverse
of the error function erf, and
\sigma is the standard deviation (theta) as in
normal1sd.
Technically, normal1sdQlink can be used for quantile regression
over any vector of p–quantiles of Normally distributed data
with known mean.
See normal1sd for further details
about the latter.
Values of p out of the open interval (0, 1) or non–positive
values of theta will result in Inf, -Inf,
NA or NaN.
Value
When deriv = 0, the normal1sdQlink transformation
of theta, if inverse = FALSE. Conversely, if
inverse = TRUE, theta becomes \eta and
the inverse transformation given by
(theta - mean)/\sqrt{2}
erf^{-1}(2p - 1)
is returned.
For deriv = 1, d eta / d theta
if inverse = FALSE. Else, this function returns
d theta / d eta as a function of theta.
For deriv = 2, the second order derivatives are accordingly
returned.
Warning
If p is a vector, then the recycling rule applies only
if theta is entered as a matrix. Else, only the first entry in
p is considered.
Note
When inverse = TRUE, the reciprocal of the error function,
erf, evaluated at 2p- 1
is required. However, the result is Inf for p= 0.5.
Here, in consequence, the limit of erf when
p tends to 0.5 is returned to avoid numerical issues.
Author(s)
V. Miranda
See Also
Examples
### Example 1 ###
theta <- seq(0, 3, by = 0.1)[-1] # This is sigma, then must be positive.
mean <- -2.5 # Intentionally, a negative value for mu.
p <- 0.25 # Modelling the first quartile.
eta <- normal1sdQlink(theta = theta, p = p, mean = mean)
inv.eta <- normal1sdQlink(theta = eta, p = p, mean = mean, inverse = TRUE)
summary(inv.eta - theta) ## Should be 0
### Example 2. Special values of theta, using argument 'bvalue'. ###
theta <- c(-Inf, -5, -2.5, 0, 2.5, 5, Inf, NA, NaN)
my.matrix <- rbind(theta, normal1sdQlink(theta = theta, p = p, mean = mean),
normal1sdQlink(theta = theta, p = p, mean = mean, bvalue = 1e-5))
rownames(my.matrix) <- c("theta", "No 'bvalue'", "With 'bvalue'")
colnames(my.matrix) <- rep("", 9)
my.matrix # Second row has NAs, whilst third row has NO NAs except for theta = NA
Estimation and Inference for Conditional Quantiles of a 1–parameter Univariate Normal Distribution.
Description
Maximum likelihood estimation of the standard deviation, including inference for conditional quantiles, of a univariate normal distribution.
Usage
normal1sdff(zero = NULL, link = "loglink",
fixed.mean = 0, p.quant = NULL,
var.arg = FALSE)
Arguments
zero |
Allows to model the single linear predictor in this family function as intercept–only. See below for important details about this. |
link |
This is the link function applied to the standard deviation.
If |
fixed.mean |
Numeric, a vector or a matrix. It allocates the (fixed) mean of the response in the fitting process. See below for further details. |
p.quant |
Numeric. A prototype vector of probabilities indicating the quantiles of interest, when quantile regression is to be performed. |
var.arg |
If |
Details
This family function is a variant of
uninormal
to estimate the standard deviation of a Normal distribution
with known mean. The estimated values are returned as
the fitted values, unlike some other family functions where the mean
is returned as fitted values. However, here the mean is
assumed to be known.
By default, the response is supposedly centered on its mean,
that is, fixed.mean = 0. Change this accordingly:
For a single response or multiple responses, fixed.mean must
be a numeric vector where each entry is the mean of each response,
only if the mean is fixed. When the mean is not constant,
fixed.mean must be matrix with the number of columns matching
the number of responses.
Quantile regression:
The (single) linear/additive predictor by default is the log
of the standard deviation. However, if quantile regression is of
primary interest, then the response must be entered using the function
Q.reg, and the corresponding
p–quantiles through p.quant in the
vglm or vgam call.
Additionally, set normalsdQlink
as the link function via the argument link.
This family VGAM function handles multiple responses.
Value
An object of class "vglmff".
See vglmff-class for further details.
Warning
Be aware of the argument zero:
by default, the single linear/additive predictor in this family
function, say \eta, can be modeled in terms of covariates,
i.e., zero = NULL.
To model \eta as intercept–only, set zero = "sd".
See zero for more details about this.
Author(s)
V. Miranda.
See Also
normal1sdQlink,
loglink,
uninormal,
CommonVGAMffArguments,
zero,
vgam,
vglm.
Examples
set.seed(121216)
my.mean <- -1 # Mean (CONSTANT)
my.sd <- 2.5
y <- rnorm(100, mean = my.mean, sd = 2.0) # Generate some data.
normdat <- data.frame(y = y) # Setting up our data.
# Plotting the data
plot(y, main = c("Y ~ Normal ( mean(known), sd = 2.5 ). "),
ylab = "The data", pch = 20,
xlim = c(0, 100), ylim = c(-7, 7), col = "blue")
abline(h = 0, v = 0, lwd = 2, col = "black")
### EXAMPLE 1. Estimate the SD with two responses. The mean is fixed. ###
fit1 <- vglm(cbind(y, y) ~ 1, family = normal1sdff(fixed.mean = my.mean),
data = normdat, trace = TRUE, crit = "coef")
Coef(fit1)
summary(fit1)
### EXAMPLE 2. Quantile regression. The link normal1sdQlink() is used. ###
my.p <- c(25, 50, 75) / 100 # Quantiles 25%, 50% and 75% are of interest.
fit2 <- vglm(Q.reg(y, length.arg = 3) ~ 1,
family = normal1sdff(fixed.mean = my.mean, p.quant = my.p,
link = normal1sdQlink),
data = normdat, trace = TRUE, crit = "coef")
summary(fit2)
head(predict(fit2))
constraints(fit2)
### EXAMPLE 3. Complete the plot. Quantiles matching. ###
( my.c3Q <- coef(fit2, matrix = TRUE) )
with(normdat, lines(rep(my.c3Q[1], 100), col = "tan" , lty = "dotted", lwd = 2))
with(normdat, lines(rep(my.c3Q[2], 100), col = "orange", lty = "dotted", lwd = 2))
with(normdat, lines(rep(my.c3Q[3], 100), col = "brown1", lty = "dotted", lwd = 2))
legend(20, 7.0, c("Percentil 75", "Percentil 50", "Percentil 25"),
col = c("brown1", "orange", "tan"),
lty = rep("dotted", 3), lwd = rep(2, 3), cex = 0.75)
Not-documented functions and classes in VGAMextra
Description
Those functions not documented yet in VGAMextra are aliased to this file.
Details
These functions are still under review or being tested, and will be documented over time.
Value
Overall, these functions returns objects required by time series family functions in VGAMextra.
Further details will be given shortly.
Author(s)
V. Miranda
Link functions for the mean of 1–parameter discrete distributions: The Positive Poisson Distribution.
Description
Computes the posPoiMlink transformation,
its inverse and the first two derivatives.
Usage
posPoiMlink(theta, bvalue = NULL,
alg.roots = c("Newton-Raphson", "bisection")[1],
inverse = FALSE, deriv = 0, short = TRUE, tag = FALSE)
Arguments
theta |
Numeric or character. This is |
bvalue |
Details at |
alg.roots |
Character. The iterative method to find the inverse of this link
function. Same as |
inverse, deriv, short, tag |
Details at |
Details
This is a link function for the mean of the positive Poisson distribution. It is defined as
\eta = \tt{posPoiMlink}(\lambda) =
-\log (\lambda^{-1} - \lambda^{-1} e^{-\lambda}),
where \lambda > 0 stands for the single parameter of
pospoisson, i.e.
theta in the VGLM/VGAM context.
Notice, the mean of the positive Poisson is given by
\frac{\lambda}{1 - e^{-\lambda}}.
This link function comes up by taking the logarithm on both sides of this equation.
The domain set for \lambda is (0, \infty).
Hence, non–positive values of \lambda will result in NaN
or NA. Use argument bvalue to properly replace
them before computing the link function.
posPoiMlink tends to infinity as \lambda
increases. Specially, its inverse grows at a higher rate. Therefore,
large values of \lambda will result in Inf accordingly.
See example 2 below.
If theta is a character, arguments inverse and
deriv are disregarded.
Value
For deriv = 0, the posPoiMlink transformation of
theta, if inverse = FALSE. When inverse = TRUE,
theta becomes \eta and the inverse of
posPoiMlink is needed but cannot be written in
closed–form. Instead this link function returns the approximated
inverse image of \eta, say \theta_\eta,
such that
\tt{posPoiMlink}(\theta_\eta) = \eta.
Here, \theta_\eta is iteratively computed
as the unique root of the auxiliary function
f(\theta; \eta) = \tt{posPoiMlink}(\theta) - \eta,
as a function of \theta.
This work is performed via Newton–Raphson or bisection,
as per argument alg.roots.
For deriv = 1, d eta / d theta
as a function of theta if inverse = FALSE, else
the reciprocal d theta / d eta.
Similarly, when deriv = 2 the second order derivatives
are returned in terms of theta.
Warning
This link function is monotonic increasing in (0, \infty) so that
the horizontal axis is an asymptote. Then, in order to assure the root
of the auxiliary
f(\theta; \eta) = \tt{posPoiMlink}(\theta) - \eta
to be real, \eta must be positive. As a result,
posPoiMlink is shited–down and hence intersecting
the horizontal axis uniquely.
Note
This link function is useful to model any parameter
in (0, \infty). Some numerical issues may arise if there are
covariates causing negative values the parameter.
Try identitylink alternatively.
Author(s)
V. Miranda and T. W. Yee.
See Also
pospoisson,
newtonRaphson.basic,
bisection.basic,
Links,
identitylink.
Examples
## Example 1. Special values for theta (or eta, accordingly) ##
m.lambda <- c(0, 0.5, 1, 10, 20, 25, 1e2, 1e3, Inf, -Inf, NaN, NA)
# The 'posPoiMlink' transformation and the first two derivatives.
print(rbind(m.lambda,
deriv1 = posPoiMlink(theta = m.lambda, inverse = FALSE, deriv = 1),
deriv2 = posPoiMlink(theta = m.lambda, inverse = FALSE, deriv = 2)),
digits = 2)
# The inverse of 'posPoiMlink' and the first two derivatives.
print(rbind(m.lambda,
Invderiv1 = posPoiMlink(theta = m.lambda, inverse = TRUE, deriv = 1),
Invderiv2 = posPoiMlink(theta = m.lambda, inverse = TRUE, deriv = 2)),
digits = 2)
## Example 2. The inverse of 'posPoiMlink' ##
m.lambda <- c(0, 1, 5, 10, 1e2, 1e3)
posPoiMlink(theta = posPoiMlink(m.lambda, inverse = TRUE))
pospoi.inv <- posPoiMlink(posPoiMlink(m.lambda, inverse = TRUE)) - m.lambda
summary(pospoi.inv) ## Should be zero.
## Example 3. Plot of 'posPoiMlink' and its first two derivatives ##
## inverse = FALSE, deriv = 0, 1, 2. ##
m.lambda <- seq(0, 35, by = 0.01)[-1]
y.lambda <- posPoiMlink(theta = m.lambda, deriv = 0)
der.1 <- posPoiMlink(theta = m.lambda, deriv = 1)
der.2 <- posPoiMlink(theta = m.lambda, deriv = 2)
plot(y.lambda ~ m.lambda, col = "black",
main = "log(mu), mu = E[Y], Y ~ pospoisson(lambda).",
ylim = c(-1, 10), xlim = c(-1, 26),
lty = 1, type = "l", lwd = 3)
abline(v = 0, h = 0, col = "gray50", lty = "dashed")
lines(m.lambda, der.1, col = "blue", lty = 5, lwd = 3)
lines(m.lambda, der.2, col = "chocolate", lty = 4, lwd = 3)
legend(5, 9, legend = c("posPoiMlink", "deriv = 1", "deriv = 2"),
col = c("black", "blue", "chocolate"), lty = c(1, 5, 4), lwd = c(3, 3, 3))
Link functions for the mean of 1–parameter continuous distributions: The Rayleigh and the Maxwell distributions.
Description
The rayleighMlink and the maxwellMlink
transformations, their inverse and the first two derivatives.
Usage
rayleighMlink(theta, bvalue = NULL, inverse = FALSE,
deriv = 0, short = TRUE, tag = FALSE)
maxwellMlink(theta, bvalue = NULL, inverse = FALSE,
deriv = 0, short = TRUE, tag = FALSE)
Arguments
theta |
Numeric or character. It is |
bvalue, inverse, deriv, short, tag |
See |
Details
rayleighMlink and maxwellMlink are link functions
to model the mean of the Rayleigh distirbution,
(rayleigh),
and the mean of the Maxwell distribution,
(maxwell),
respectively.
Both links are somehow defined as the
\log {\tt{theta}}
plus an offset. Specifcally,
{\tt{rayleighMlink}}(b) = \log ( b * \gamma(0.5) / sqrt{2} ),
where b > 0 is a scale parameter as in
rayleigh; and
{\tt{maxwellhMlink}}(b) = \log ( a^{-1/2} * sqrt{8 / \pi} ).
Here, a is positive as in maxwell.
Non–positive values of a and/or b will result in
NaN, whereas values too close to zero will return
Inf or -Inf.
Value
For deriv = 0, the corresponding transformation of
theta when inverse = FALSE.
If inverse = TRUE, then theta becomes \eta, and
the inverse transformations
I) exp(theta) * sqrt(2) / gamma(0.5)
for rayleighMlink, and
II) 8 * exp(-2 * theta) / gamma(0.5)^2 for
maxwellMlink,
are returned.
For deriv = 1,
d eta / d theta when inverse = FALSE.
If inverse = TRUE, then
d theta / d eta as a function of
theta.
When deriv = 2, the second derivatives in
terms of theta are returned.
Note
Values of a or b out of range, e.g. when covariates
involved, may cause numerical instability.
Use argument bvalue to replace them before computing any link.
If theta is character, then arguments inverse and
deriv are ignored. See Links
for further details.
Author(s)
V. Miranda and Thomas W. Yee.
See Also
Examples
## The link and its inverse ##
theta <- 0.1 + 1:10
eta <- maxwellMlink(maxwellMlink(theta = theta), inverse =TRUE)
summary(eta - theta) # Zero
eta <- rayleighMlink(rayleighMlink(theta = theta), inverse =TRUE)
summary(eta - theta) # Zero
## Modelling the mean of the Maxwell distribution ##
set.seed(17010401)
rate <- maxwellMlink(theta = 2, inverse = TRUE) # ~ 0.046
mdata <- data.frame(y = rmaxwell(1000, rate = rate ))
fit <- vglm(y ~ 1, maxwell(link = "maxwellMlink"),
data = mdata, trace = TRUE, crit = "coef")
coef(fit, matrix = TRUE)
Coef(fit)
Link functions for the quantiles of several 1–parameter continuous distributions
Description
Computes the rayleighQlink transformation, its inverse and the
first two derivatives.
Usage
rayleighQlink(theta, p = stop("Argument 'p' must be specified."),
bvalue = NULL, inverse = FALSE,
deriv = 0, short = TRUE, tag = FALSE)
Arguments
theta |
Numeric or character. It is |
p |
Numeric. A single value between 0.0 and 1.0.
It is the |
bvalue, inverse, deriv, short, tag |
See |
Details
This link function directly models any p–quantile of the
Rayleigh distribution specified by the argument p.
It is called the rayleighQlink transformation defined as
b \sqrt{-2 \log(1 - p)},
where b > 0 is a scale parameter as in
rayleigh.
Numerical values of b or p out of range may
result in Inf, -Inf, NA or NaN.
If theta is character, then arguments inverse and
deriv are discarded.
Value
For deriv = 0, the rayleighQlink transformation of
theta, when inverse = FALSE. If inverse = TRUE, then
this function returns theta / sqrt(-2 log(1 - p)).
For deriv = 1, then the function returns
d eta / d theta,
if inverse = FALSE. If inverse = TRUE, then
d theta / d eta as a function of theta.
If deriv = 2, then the second order derivatives in terms of
theta.
Note
Numerical instability may occur for values theta too close
to zero. Use argument bvalue to replace them before
computing the link.
Author(s)
V. Miranda and Thomas W. Yee.
See Also
Examples
## E1. rayleighQlink() and its inverse ##
p <- 0.50 ## Modeling the median
my.b <- seq(0, 5, by = 0.1)[-1]
max(my.b - rayleighQlink(rayleighQlink(my.b, p = p), p = p, inverse =TRUE)) ## Zero
## E2. Special values ##
rayleighQlink(theta = c(Inf, -Inf, NA, NaN), p = p)
## E3. Use of argument 'bvalue' ##
rayleighQlink(theta = seq(-0.2, 1.0, by = 0.1), p = p) # WARNING: NaNs if theta <= 0
rayleighQlink(theta = seq(-0.2, 1.0, by = 0.1), p = p, bvalue = .Machine$double.xmin)
Summary methods for Vector Generalized Time Series Models
Description
S4 summary methods for models fitted with time series family functions from VGAMextra.
These function are all
methods for objects of class
vglm with signature
vgltsmff-class.
Details
Implementation of vector generalized time series (TS) family functions
(vgltsmff) in VGAMextra is entirely based on the structure
of family functions of class
vglmff-class
from VGAM. More precisely, vgltsmff family functions
can be created by calls of the form new("vgltsmff",...),
following the structure vglmff-class.
See vglmff-class for
additional details.
In this line, specific S4 dispatching methods are currently implemented at VGAM to show (or plot) essential statistical information about the model fitted.
For the generic summary, specifically, S4 methods for
objects with signature vgtsff are incorporated
in VGAMextra to display supplementary analyses commonly
required by TS practicioners. That is, additional information
to the default output shown by
summaryvglm for family functions
at VGAM, as follows:
a) The standard errors, which are computed from the asymptotic distribution of the MLE estimates, unlike the asymptotic approach (z-value) from VGAM.
b) Checks on stationarity and/or invertibility for autoregressive (AR), moving average (MA), and autoregressive moving-average (ARMA) models via the polynomial roots.
c) The AIC, AICC and BIC criteria for model identification.
Notice that, for intercept-only models in the 'vglm' context,
the asypmtotic distribution of the estimates, either conditional
or unconditional, will coincide with the theoretical distributions
as long as n increases. In particular, for the AR(p)
process, the MLEs and the Yule-Walker estimates will concur
asymptotically.
Where covariates or parameter constraints are involved,
the standard errors for the estimates from time series
family functions at VGAMextra are calculated from the
predicted values allocated in the slot @predictors,
when summary(...) is called.
In this case, the conditional mean,
\textrm{E}[\eta_j | \textbf{x}] x ], is considered as the estimate, where:
\eta_j = \sum_{k = 1}^{p} \beta_{(j)k} \times x_{k},
for j = 1, \ldots, M.
Value
An object of class
summary.vglm printed by specific methods defined
at VGAMextra for objects with signature
vgltsff-class.
Note
As for the intercept, notice that this is called drift-term
at ARXff and
ARMAXff,
whilst it is refered as intercept
in MAXff.
This parameter is also estimated by TS family
functions in VGAMextra. In the MA model, particularly,
it is the mean of the process.
The drift-term,
denoted as \mu^*,
is linearly linked to the mean,
\mu,
of the AR and ARMA processes in turn, as follows:
\mu \to \frac{\mu^{*} }{1 - \sum \theta_i}.
Here, \theta_i are the AR coefficients.
Hence, the standard error for the drift-term is
accordingly computed based on the asymptotic distribution
of the mean. More precisely, the relation
V(\mu^{*}) = (1 - \sum \theta_i)^{-2} \times
\frac{\sigma_{\varepsilon}^2 }{n},
is considered,
where \sigma_{\varepsilon}^2 is the variance
of the random errors.
Finally, the AIC, AICC and BIC criteria are computed of the well-known expressions
AIC = (-2) \times Log-likelihood + 2 \times k
AICC = AIC + \frac{2~k~(k + 1)}{n - k - 1}
and
BIC = (-2) \times Log-likelihood + k~\times~ln(n)
with k denoting the number of parameters.
Author(s)
V. Miranda and T.W. Yee.
References
Woodward, H., Gray, H. and Elliot A. (2012) Applied Time Series Analysis. Taylor & Francis/CRC, Florida, USA.
See Also
vgtsff-class,
summaryvlgm,
ARXff,
MAXff,
ARMAXff,
vglm.
Examples
#------------------------------------------------------------------------#
# Fitting a simple Moving Average model to compare with arima().
#------------------------------------------------------------------------#
set.seed(0628)
nn <- 300
theta <- c(0.2, -0.37) # Autoregressive coefficients
phi <- c(0.25) # MA coefficients.
mu <- c(1.5, 0.85) # Mean (not drift) of the MA process.
x2 <- runif(nn)
tsd1 <- mu[1]/(1 - sum(theta)) +
arima.sim(n = nn,
model = list(order = c(2, 0, 0),
ar = theta),
sd = exp(1.5))
tsd2 <- mu[2]/(1 - sum(theta)) +
arima.sim(n = nn,
model = list(order = c(2, 0, 1),
ar = theta, ma = phi),
sd = exp(1 + 2 * x2))
tsdata <- data.frame(TS1 = tsd1, TS2 = tsd2, x2 = x2)
head(tsdata)
### An ARIMA(2, 0, 0) model, that is an AR(2) model ###
#fit1 <- vglm(TS1 ~ 1,
# ARIMAXff(order = c(2, 0, 0), var.arg = FALSE, type.EIM = "exact"),
# data = tsdata, crit = "log", trace = TRUE)
fit1 <- vglm(TS1 ~ 1,
ARXff(order = 2, var.arg = FALSE, type.EIM = "exact"),
data = tsdata, crit = "log", trace = TRUE)
m.coe <- Coef(fit1)
## Using arima to compare to summary(vgtsff)
summary(fit1)
arima(tsdata$TS1, order = c(2, 0, 0)) ## Similar SE's than VGAMextra.
m.coe[1] / (1 - sum(m.coe[-(1:2)])) # THIS IS SIMILAR TO THE INTERCEPT
# ESTIMATED BY arima(): 1.1898
### An ARIMA(2, 0, 1) models, that is an ARMA(2, 1) ###
### The errors standard deviation is a function of 'x2' ###
### NOTICE: ARIMA and ARMA use the "identitylink" for coefficients ###
#fit2 <- vglm(TS2 ~ x2,
# ARMAXff(order = c(2, 1), var.arg = FALSE, type.EIM = "exact",
# zero = NULL),
# # constraints = list('x2' = rbind(0, 1, 0, 0, 0)),
# data = tsdata, crit = "loglikelihood", trace = TRUE)
#m.coe <- coef(fit2)
#coef(fit2, matrix = TRUE)
## Compare summary(vglm) to arima().
#summary(fit2)
#arima(tsdata$TS2, order = c(2, 0, 1))
Link functions for the mean of 1–parameter continuous distribution: The Topp–Leone distribution.
Description
Computes the toppleMlink transformation, its inverse and
the first two derivatives.
Usage
toppleMlink(theta, bvalue = NULL, inverse = FALSE,
deriv = 0, short = TRUE, tag = FALSE)
Arguments
theta |
Numeric or character.
See |
bvalue, inverse, deriv, short, tag |
See |
Details
The toppleMlink transformation arises as a link function to
model the mean of the Topp–Leone distribution,
topple. It is defined as
\eta = {\tt{logit}} \left( \left( 1 - \frac{4^{s} \Gamma(1 + s)^2}{
\Gamma(2 + 2s)} \right) / sup.tp \right).
Here, 0 < s < 1 is a shape parameter as in
topple, whereas
sup.tp is the supremum of
1 - \frac{4^{s} \Gamma(1 + s)^2}{ \Gamma(2 + 2s)},
in (0, 1), as a function of s.
For numerical values of s out of (0, 1), this link may
result in Inf, -Inf, NA or NaN.
Value
For deriv = 0, the toppleMlink transformation of
theta when inverse = FALSE.
If inverse = TRUE, then theta becomes \eta, and
the inverse transformation is required. However, it
can't be expressed in close form. Therefore, the approximate
inverse image of entered theta computed by
newtonRaphson.basic
is returned.
For deriv = 1,
d eta / d theta when inverse = FALSE.
If inverse = TRUE, then
d theta / d eta as a function of
theta.
Note
Values of s too close to zero or 1.0 may cause numerical
instability. Use argument bvalue to replace them before
computing the link.
If theta is character, then arguments inverse and
deriv are ignored. See Links
for further details.
Author(s)
V. Miranda and Thomas W. Yee.
See Also
topple,
Links,
newtonRaphson.basic.
Examples
## E1. The toppleMlink() and its inverse ##
theta <- ppoints(10)
eta <- toppleMlink(toppleMlink(theta = theta), inverse =TRUE)
summary(eta - theta) # Zero
## E2. Some probability link functions ##
my.probs <- ppoints(100)
par(lwd = 2)
plot(my.probs, logitlink(my.probs), xlim = c(-0.1, 1.1), ylim = c(-5, 8),
type = "l", col = "limegreen",
ylab = "transformation", las = 1, main = "Some probability link functions")
lines(my.probs, geometricffMlink(my.probs), col = "gray50")
lines(my.probs, logffMlink(my.probs), col = "blue")
lines(my.probs, probitlink(my.probs), col = "purple")
lines(my.probs, clogloglink(my.probs), col = "chocolate")
lines(my.probs, cauchitlink(my.probs), col = "tan")
lines(my.probs, toppleMlink(my.probs), col = "black")
abline(v = c(0.5, 1), lty = "dashed")
abline(v = 0, h = 0, lty = "dashed")
legend(0.1, 8,
c( "toppleMlink", "geometricffMlink", "logffMlink",
"logitlink", "probitlink",
"clogloglink", "cauchitlink"),
col = c("black", "gray50", "blue", "limegreen", "purple", "chocolate", "tan"),
lwd = 1, cex = 0.5)
par(lwd = 1)
Link functions for the quantiles of several 1–parameter continuous distributions
Description
Computes the toppleQlink transformation, its inverse and the
first two derivatives.
Usage
toppleQlink(theta, p = stop("Argument 'p' must be specified."),
bvalue = NULL, inverse = FALSE,
deriv = 0, short = TRUE, tag = FALSE)
Arguments
theta |
Numeric or character. It is |
p |
Numeric. A single value between 0 and 1.
It is the |
bvalue, inverse, deriv, short, tag |
See |
Details
This link function comforms with requirements of VGAM in order
to be compatible within the VGLM/VGAM framework. That is, monotonic,
onto, among other features.
In this line, the toppleQlink transformation arises as the
proper link to model any quantile of the Topp–Leone distribution
topple. It is defined as
\frac{1 - \sqrt{1 - p^{1/s}}}{m.max}.
Here, s is a shape parameter lying in (0, 1)
as in topple,
whereas m.max stands for the maximum in (0, 1) of
1 - \sqrt{1 - p^{1/s}}
as a function of s. Note, p is prespecified (fixed) between
0 and 1.
Numerical values of s or p out of range will
result in Inf, -Inf, NA or NaN
correspondingly.
Arguments inverse and deriv will be ignored
if theta is character.
Value
For deriv = 0, the toppleQlink transformation of
theta, when inverse = FALSE. If inverse = TRUE, then
the inverse transformation
log(p)/log(1 - (1 - theta * m.max)^2)
is returned.
For deriv = 1, this function returns
d eta / d theta,
if inverse = FALSE. If inverse = TRUE, then the reciprocal
d theta / d eta as a function of theta.
If deriv = 2, then the second order derivatives as a function of
theta.
Warning
The expression p^{1/s} tends rapidly to zero specially
for values of s less than 0.005. Therefore, in such cases
numerical values represented as zero may be returned when computing
this link function, regardless the value of argument inverse.
Note
Numerical instability may occur for values theta too close
to 0.0 or 1.0. Use argument bvalue to replace them before
computing the link.
Author(s)
V. Miranda and Thomas W. Yee.
See Also
Examples
## E1. toppleQLink() and its inverse. ##
my.s <- ppoints(50); p <- 0.75
max(my.s - toppleQlink(toppleQlink(my.s, p = p), p = p, inverse =TRUE)) ## Zero
toppleQlink(theta = c(-0.15, -0.10, 0.25, 0.35) , p = p, inverse = FALSE) ## NaNs
toppleQlink(theta = c(-0.15, -0.10, 0.25, 0.35) , p = p, inverse = TRUE) ## NaNs
## E2. toppleQLink() for different avlues of 'p'. ##
plot(toppleQlink(theta = my.s, p = 0.05) ~ my.s,
type = "l", col = "blue", lty = "dotted", lwd = 3,
xlim = c(-0.1, 1.65), ylim = c(-0.1, 1.1), las = 1,
main = c("The toppleQlink() transformation"),
ylab = "eta = toppleQlink", xlab = "theta")
abline(h = 0, v = 0, lwd = 2)
abline(h = 1, v = 1, lty = "dotted", col = "green")
lines(toppleQlink(theta = my.s, p = 0.25) ~ my.s, lwd = 2, lty = "dashed", col = "gray")
lines(toppleQlink(theta = my.s, p = 0.50) ~ my.s, lwd = 2, lty = "dashed", col = "brown")
lines(toppleQlink(theta = my.s, p = 0.75) ~ my.s, lwd = 2, lty = "dashed", col = "orange")
lines(toppleQlink(theta = my.s, p = 0.95) ~ my.s, lwd = 2, lty = "dashed", col = "gray50")
legend(1.1, 1.0, c("p = 0.05", "p = 0.25", "p = 0.50", "p = 0.75", "p = 0.95"),
lwd = rep(2, 5), lty = rep("dashed", 5),
col = c("blue", "gray", "brown", "orange", "gray50"))
Trivariate Normal Distribution Family Function
Description
Estimates the means and the upper-half of the (symmetric) covariance matrix of a trivariate normal distribution by maximum likelihood.
Usage
trinormalCovff(zero = c("var", "cov"),
lmean = "identitylink",
lvar = "loglink",
lcov = "identitylink")
Arguments
zero |
The linear predictors modelled as intercept–only.
See |
lmean, lvar, lcov |
Link functions applied to the means, variances (diagonal elements of the
covariance matrix), and covariances (off-diagonal elements).
See |
Details
This family function is similar to
trinormal. The only difference is
that the variances and covariances, instead of the
standard deviations and correlation coefficients,
are directly modelled and estimated.
Similarly, trinormalCovff
also fits linear models to the means of a trivariate normal distribution.
The fitted means are returned as the fitted values in the form of a
three–column matrix. By default, the variances and covariances
are modelled as intercept–only, where a
loglink link is applied to the variances
and an identitylink over
the covariances.
Value
An object of class "vglmff"
(see vglmff-class) to be
used by VGLM/VGAM modelling functions, e.g.,
vglm or vgam.
Author(s)
Victor Miranda and Thomas Yee.
See Also
Examples
set.seed(123); nn <- 350
var1 <- exp(1.5); var2 <- exp(0.75); var3 <- exp(1.0)
### Artificial data, with two covariates.
tdata <- data.frame(x2 = runif(nn), x3 = runif(nn))
tdata <- transform(tdata,
y1 = rnorm(nn, 1 + 2 * x2, sd = sqrt(var1)),
y2 = rnorm(nn, 3 + 1 * x2, sd = sqrt(var2)),
y3 = rnorm(nn, 3 - 1 * x3, sd = sqrt(var2 * var3)))
### Fit the model using VGAMextra::trinormalCovff().
fit.trinormCovff <- vglm(cbind(y1, y2, y3) ~ x2 + x3,
trinormalCovff,
data = tdata, trace = TRUE)
summary(fit.trinormCovff)
vcov(fit.trinormCovff)
### Fitting the model using VGAM::trinormal()
fit.trinormVGAM <- vglm(cbind(y1, y2, y3) ~ x2 + x3,
trinormal,
data = tdata, trace = TRUE)
summary(fit.trinormVGAM)
vcov(fit.trinormVGAM)
#### Compare the estimated coefficients. Note that
#### trinormal() estimates the sd's and correlation coeffs.
coef(fit.trinormCovff, matrix = TRUE)
coef(fit.trinormVGAM, matrix = TRUE)
The Truncated Log-Normal Distribution
Description
Density, distribution function, quantile function and random generation for the truncated log-normal distribution
Usage
dtrunclnorm(x, meanlog = 0, sdlog = 1, min.support = 0, max.support = Inf, log = FALSE)
ptrunclnorm(q, meanlog = 0, sdlog = 1, min.support = 0, max.support = Inf)
qtrunclnorm(p, meanlog = 0, sdlog = 1, min.support = 0, max.support = Inf, log.p = FALSE)
rtrunclnorm(n, meanlog = 0, sdlog = 1, min.support = 0, max.support = Inf)
Arguments
x, q, p, n, meanlog, sdlog |
Same as |
min.support, max.support |
Lower and upper truncation limits. |
log, log.p |
Same as |
Details
Consider Y \sim
Lognormal(\mu_Y, \sigma_Y ) restricted
to (A, B ),
that is, 0 < A = \code{min.support} < X < B
= \code{max.support}.
The (conditional)
random variable
Y = X \cdot I_{(A , B)}
has a log–truncated normal distribution. Its p.d.f. is given by
f(y; \mu, \sigma, A, B) = (y^{-1} / \sigma) \cdot
\phi(y^*) / [ \Phi(B^*) - \Phi(A^*) ],
where y^* = [\log(y) - \mu_Y]/ \sigma_Y,
A^* = [\log(A) - \mu_Y] / \sigma_Y,
and
B^* = [\log(B) - \mu_Y] / \sigma_Y.
Its mean is:
\exp(\mu + \sigma^2/2) \cdot \{\Phi[(\log(B) - \mu) / \sigma
- \sigma] - \Phi[(\log(A) - \mu) / \sigma
- \sigma] \} / \{
\Phi[(\log(B) - \mu) / \sigma] - \Phi[(\log(A) - \mu) / \sigma] \}.
Here, \Phi is the standard normal c.d.f and
\phi is the standard normal p.d.f.
Value
dtrunclnorm() returns the density,
ptrunclnorm() gives the
distribution function,
qtrunclnorm() gives the quantiles, and
rtrunclnorm() generates random deviates.
Author(s)
Victor Miranda and Thomas W. Yee.
References
Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, Second Edition, (Chapter 13) Wiley, New York.
See Also
Examples
###############
## Example 1 ##
mymeanlog <- exp(0.5) # meanlog
mysdlog <- exp(-1.5) # sdlog
LL <- 3.5 # Lower bound
UL <- 8.0 # Upper bound
## Quantiles:
pp <- 1:10 / 10
(quants <- qtrunclnorm(p = pp , min.support = LL, max.support = UL,
mymeanlog, mysdlog))
sum(pp - ptrunclnorm(quants, min.support = LL, max.support = UL,
mymeanlog, mysdlog)) # Should be zero
###############
## Example 2 ##
set.seed(230723)
nn <- 3000
## Truncated log-normal data
trunc_data <- rtrunclnorm(nn, mymeanlog, mysdlog, LL, UL)
## non-truncated data - reference
nontrunc_data <- rtrunclnorm(nn, mymeanlog, mysdlog, 0, Inf)
## Not run:
## Densities
plot.new()
par(mfrow = c(1, 2))
plot(density(nontrunc_data), main = "Non-truncated Log--normal",
col = "green", xlim = c(0, 15), ylim = c(0, 0.40))
abline(v = c(LL, UL), col = "black", lwd = 2, lty = 2)
plot(density(trunc_data), main = "Truncated Log--normal",
col = "red", xlim = c(0, 15), ylim = c(0, 0.40))
## Histograms
plot.new()
par(mfrow = c(1, 2))
hist(nontrunc_data, main = "Non-truncated Log--normal", col = "green",
xlim = c(0, 15), ylim = c(0, 0.40), freq = FALSE, breaks = 22,
xlab = "mu = exp(0.5), sd = exp(-1.5), LL = 3.5, UL = 8")
abline(v = c(LL, UL), col = "black", lwd = 4, lty = 2)
hist(trunc_data, main = "Truncated Log--normal", col = "red",
xlim = c(0, 15), ylim = c(0, 0.40), freq = FALSE,
xlab = "mu = exp(0.5), sd = exp(-1.5), LL = 3.5, UL = 8")
## End(Not run)
## Area under the estimated densities
# (a) truncated data
integrate(approxfun(density(trunc_data)),
lower = min(trunc_data) - 0.1,
upper = max(trunc_data) + 0.1)
# (b) non-truncated data
integrate(approxfun(density(nontrunc_data)),
lower = min(nontrunc_data),
upper = max(nontrunc_data))
The Truncated Normal Distribution
Description
Density, distribution function, quantile function and random numbers generator for the truncated normal distribution
Usage
dtruncnorm(x, mean = 0, sd = 1, min.support = -Inf, max.support = Inf, log = FALSE)
ptruncnorm(q, mean = 0, sd = 1, min.support = -Inf, max.support = Inf)
qtruncnorm(p, mean = 0, sd = 1, min.support = -Inf, max.support = Inf, log.p = FALSE)
rtruncnorm(n, mean = 0, sd = 1, min.support = -Inf, max.support = Inf)
Arguments
x, q, p, n, mean, sd |
Same as |
min.support, max.support |
Lower and upper truncation limits. |
log, log.p |
Same as |
Details
Consider X \sim N(\mu,
\sigma^2), with A < X < B,
i.e., X restricted to (A , B).
We denote A = min.support and
B = max.support.
Then the conditional
random variable
Y = X \cdot I_{(A , B)}
has a truncated normal distribution. Its p.d.f. is given by
f(y; \mu, \sigma, A, B) = (1 / \sigma) \cdot
\phi(y^*) / [ \Phi(B^*) - \Phi(A^*) ],
where y^* = (y - \mu)/ \sigma,
A^* = (A - \mu) / \sigma,
and
B^* = (B - \mu) / \sigma.
Its mean is
\mu + \sigma \cdot [ \phi(A) - \phi(B) ] /
[\Phi(B) - \Phi(A)].
Here, \Phi is the standard normal c.d.f and
\phi is the standard normal p.d.f.
Value
dtruncnorm() returns the density,
ptruncnorm() gives the
distribution function,
qtruncnorm() gives the quantiles, and
rtruncnorm() generates random deviates.
dtruncnorm is computed from the definition, as
in 'Details'. [pqr]truncnormal are computed based
on their relationship to the normal distribution.
Author(s)
Victor Miranda and Thomas W. Yee.
References
Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, Second Edition (Chapter 13). Wiley, New York.
See Also
Examples
###############
## Example 1 ##
mymu <- 2.1 # mu
mysd <- 1.0 # sigma
LL <- -1.0 # Lower bound
UL <- 3.0 # Upper bound
## Quantiles:
pp <- 1:10 / 10
(quants <- qtruncnorm(p = pp , min.support = LL, max.support = UL,
mean = mymu, sd = mysd))
sum(pp - ptruncnorm(quants, min.support = LL, max.support = UL,
mean = mymu, sd = mysd)) # Should be zero
###############
## Example 2 ##
## Parameters
set.seed(230723)
nn <- 3000
mymu <- 12.7 # mu
mysigma <- 3.5 # sigma
LL <- 6 # Lower bound
UL <- 17 # Upper bound
## Truncated-normal data
trunc_data <- rtruncnorm(nn, mymu, mysigma, LL, UL)
## non-truncated data - reference
nontrunc_data <- rnorm(nn, mymu, mysigma)
## Not run:
## Densities
par(mfrow = c(1, 2))
plot(density(nontrunc_data), main = "Non-truncated ND",
col = "green", xlim = c(0, 25), ylim = c(0, 0.15))
abline(v = c(LL, UL), col = "black", lwd = 2, lty = 2)
plot(density(trunc_data), main = "Truncated ND",
col = "red", xlim = c(0, 25), ylim = c(0, 0.15))
## Histograms
plot.new()
par(mfrow = c(1, 2))
hist(nontrunc_data, main = "Non-truncated ND", col = "green",
xlim = c(0, 25), ylim = c(0, 0.15), freq = FALSE, breaks = 22,
xlab = "mu = 12.7, sd = 3.5, LL = 6, UL = 17")
abline(v = c(LL, UL), col = "black", lwd = 4, lty = 2)
hist(trunc_data, main = "Truncated ND", col = "red",
xlim = c(0, 25), ylim = c(0, 0.15), freq = FALSE,
xlab = "mu = 12.7, sd = 3.5, LL = 6, UL = 17")
## End(Not run)
## Area under the estimated densities
# (a) truncated data
integrate(approxfun(density(trunc_data)),
lower = min(trunc_data) - 1,
upper = max(trunc_data) + 1)
# (b) non-truncated data
integrate(approxfun(density(nontrunc_data)),
lower = min(nontrunc_data),
upper = max(nontrunc_data))
Truncated Log-normal Distribution Family Function
Description
Maximum likelihood estimate of the two–parameter lognormal distribution with lower/upper truncation.
Usage
trunclognormal(lmeanlog = "identitylink", lsdlog = "loglink",
min.support = 1e-6, max.support = Inf, zero = "sdlog")
Arguments
lmeanlog, lsdlog, zero |
Same as |
min.support, max.support |
Positive lower and upper truncation limits (recycled).
|
Details
MLE of the two–parameter (univariate) lognormal distribution subject
to lower/upper truncation. All response values are greater
than min.support and lower than max.support.
Default values of min.support, max.suppport should
effectively reproduce lognormal.
The truncated–lognormal density for a response Y is
f(y; \mu, \sigma) = f_N(y; \mu, \sigma) /
[\Phi(\texttt{max.support}, \mu, \sigma) -
\Phi(\texttt{min.support},\mu, \sigma) ],
where f_N is the ordinary lognormal density (see
lognormal) and
\Phi is the standard normal CDF.
The mean of Y, given by
\exp{(\mu + \sigma^2/2)} \cdot
[\Phi(((\log(\texttt{max.support}) - \mu)/\sigma) - \sigma) -
\Phi(((\log(\texttt{min.support}) - \mu)/\sigma) - \sigma) ] /
\Delta \Phi(\mu,\sigma),
with \Delta \Phi(\mu, \sigma) =
\Phi( (\log(\texttt{max.support}) - \mu)/\sigma ) -
\Phi( (\log(\texttt{min.support}) - \mu)/\sigma ),
are returned as the fitted values.
Value
An object of class "vglmff"
(see vglmff-class).
The object is used by modelling functions such as
vglm,
and vgam.
Author(s)
Victor Miranda, Siqi (Vicky) Liu and Thomas W. Yee.
References
Nadarajah, S. and Kotz, S. (2003).
R Programs for Computing Truncated Distributions.
Journal of Statistical Software, Code
Snippets,
16(2), 1–8.
Cohen, A.C. (1991) Truncated and Censored Samples: Theory and Applications, New York, USA. Marcel Dekker.
See Also
lognormal,
uninormal,
CommonVGAMffArguments,
Lognormal.
Examples
##########
set.seed(10470923)
nn <- 3000
## Parameters
mysdlog <- exp(-1.5) # sdlog
LL <- 3.5 # Lower bound
UL <- 8.0 # Upper bound
## Truncated data
ldata2 <- data.frame(x2 = runif(nn))
ldata2 <- transform(ldata2, y1 = rtrunclnorm(nn, 1 + 1.5 * x2, mysdlog,
min.support = LL, max.support = UL))
# head(ldata2)
# hist(ldata2$y1, breaks = 22, col = "blue", xlim = c(0, 10))
##############################################################
# Fitting a truncated lognormal distribution - sd is intercept only
fit1 <- vglm(y1 ~ x2, trunclognormal(zero = "sdlog", min.support = LL, max.support = UL),
data = ldata2, trace = TRUE)
coef(fit1, matrix = TRUE)
vcov(fit1)
##############################################################
# Fitting a truncated lognormal distribution - zero = NULL
fit2 <- vglm(y1 ~ x2, trunclognormal(zero = NULL, min.support = LL, max.support = UL),
data = ldata2, trace = TRUE)
coef(fit2, matrix = TRUE)
vcov(fit2)
##############################################################
# Mimicking lognormal()
fit3 <- vglm(y1 ~ x2, trunclognormal(zero = "sdlog"),
data = ldata2, trace = TRUE)
coef(fit3, mat = TRUE)
# Same as
fit3bis <- vglm(y1 ~ x2, lognormal(zero = "sdlog"),
data = ldata2, trace = TRUE)
coef(fit3bis, mat = TRUE)
Truncated normal Distribution Family Function
Description
Maximum likelihood estimate of the two–parameter normal distribution with lower/upper truncation.
Usage
truncnormal(lmean = "identitylink", lsd = "loglink",
min.support = -Inf, max.support = Inf, zero = "sd")
Arguments
lmean, lsd |
Link functions applied to mean and standard deviation/variance. |
min.support, max.support |
Vector of lower and upper truncation limits (recycled).
|
zero |
See |
Details
MLE of the two–parameter (univariate) normal distribution subject to
lower/upper truncation. All response values are greater
then min.support and/or lower than max.support.
The truncated–normal density for a response Y is
f(y; \mu, \sigma) = f(y; \mu, \sigma) /
[\Phi(\texttt{max.support}, \mu, \sigma) -
\Phi(\texttt{min.support},\mu, \sigma) ],
where f is the probability density function of standard normal
distribution and \Phi is the standard normal CDF.
The mean of Y, given by
\mu + [\varphi(\texttt{min.support}) + \varphi(\texttt{max.support})/\Delta \Phi(\mu,\sigma)]\cdot \sigma,
with \Delta \Phi(\mu, \sigma) =
\Phi((\texttt{max.support} - \mu)/\sigma ) -
\Phi( (\texttt{min.support} - \mu)/\sigma ),
are returned as the fitted values.
Value
An object of class "vglmff"
(see vglmff-class).
The object is used by modelling functions such as
vglm,
and vgam.
Author(s)
Siqi (Vicky) Liu, Victor Miranda, and Thomas W. Yee.
References
Nadarajah, S. and Kotz, S. (2003).
R Programs for Computing Truncated Distributions.
Journal of Statistical Software, Code
Snippets,
16(2), 1–8.
Cohen, A.C. (1991) Truncated and Censored Samples: Theory and Applications, New York, USA. Marcel Dekker.
See Also
uninormal,
CommonVGAMffArguments.
Examples
nn <- 2000
set.seed(14290909)
## Parameters
mysd <- exp(1.0) # sd
LL <- -0.5 # Lower bound
UL <- 8.0 # Upper bound
## Truncated data
ldata2 <- data.frame(x2 = runif(nn))
ldata2 <- transform(ldata2, y1 = rtruncnorm(nn, 1 + 1.5 * x2, mysd,
min.support = LL, max.support = UL))
# head(ldata2)
# hist(ldata2$y1, breaks = 22, col = "blue", xlim = c(-5, 10))
##############################################################
# Fitting a truncated normal distribution - sd is intercept only
fit1 <- vglm(y1 ~ x2, truncnormal(zero = "sd", min.support = LL, max.support = UL),
data = ldata2, trace = TRUE)
coef(fit1, matrix = TRUE)
vcov(fit1)
##############################################################
# Fitting a truncated lognormal distribution - zero = NULL
fit2 <- vglm(y1 ~ x2, truncnormal(zero = NULL, min.support = LL, max.support = UL),
data = ldata2, trace = TRUE)
coef(fit2, matrix = TRUE)
vcov(fit2)
##############################################################
# Mimicking uninormal()
fit3 <- vglm(y1 ~ x2, truncnormal(zero = "sd"),
data = ldata2, trace = TRUE)
coef(fit3, mat = TRUE)
# Same as
fit3bis <- vglm(y1 ~ x2, uninormal(zero = "sd"),
data = ldata2, trace = TRUE)
coef(fit3bis, mat = TRUE)
Undocumented Methods Functions in VGAMextra
Description
Undocumented methods functions are aliased here. In the VGAMextra package there are currently some objects/methods/classes which are currently internal and/or undocumented. The help file suppresses the warnings when the package is 'CHECK'ed.
Methods
Methods/classes will be documented over time.
- object
-
This argument is often used, and it is the primary object from which the function operates on.
Quantile regression: Link function for the quantiles of the normal distribution.
Description
Computes the uninormalQlink transformation, its inverse and
the first two derivatives.
Usage
uninormalQlink(theta, percentile = stop("Enter percentiles."),
sd = NULL, wrt.param = NULL,
bvalue = NULL, inverse = FALSE,
deriv = 0, short = TRUE, tag = FALSE)
Arguments
theta |
Numeric or character. This is |
percentile |
Numeric. A vector of percentiles of interest, denoted as |
sd |
Numeric, positive. The 'standard deviation' parameter (required),
denoted as |
wrt.param |
Positive integer, either |
bvalue, inverse, deriv, short, tag |
See |
Details
A 2-parameter link for the quantiles of the normal
distribution. It can only be used within
uninormalff as the first linear
predictor. It is defined as
\tt{uninormalQlink}(\mu; \sigma) = \eta_1(\mu; \sigma) =
\mu + \sigma \cdot \Phi^{-1}(perc),
where \Phi is the error function
(see, e.g., erf),
\mu in (-\infty, \infty), and \sigma > 0.
This link is expressly a function of \theta = \mu, therefore
sigma must be entered at every call.
Numerical values of \sigma out of range may
result in Inf, -Inf, NA or NaN.
Value
For deriv = 0, the uninormalQlink transformation of
theta, i.e. \mu, when inverse = FALSE.
If inverse = TRUE, then \theta becomes \eta,
and the inverse, \eta - \sigma \Phi^{-1}(perc),
for given \sigma, is returned.
When deriv = 1 theta becomes
\theta = (\mu, \sigma)= (\theta_1, \theta_2), and
\eta = (\eta_1, \eta_2) with
\eta_2 = \log~\sigma,
and the argument wrt.param must be
considered:
A) If inverse = FALSE, then
d eta1 / d \mu is returned when
wrt.param = 1, and
d eta1 / d \sigma if
wrt.param = 2.
B) For inverse = TRUE, this link returns
d \mu / d eta1 and
d \sigma / d eta1 conformably arranged
in a matrix, if wrt.param = 1,
as a function of \theta_i, i = 1, 2.
When wrt.param = 2, then
d\mu / d eta2 and
d\sigma / d eta2
is returned.
For deriv = 2, the second derivatives in
terms of theta are similarly returned.
Note
Numerical instability may occur for values of
sigma too close
to zero. Use argument bvalue to replace the former only before
computing the link.
If theta is character, then arguments inverse and
deriv are ignored. See Links
for further details.
Author(s)
V. Miranda and Thomas W. Yee.
See Also
uninormalff,
uninormal,
Links.
Examples
eta <- seq(-3, 3, by = 0.1) # this is eta = log(Normal - Quantiles).
sigma <- exp(1) # 'sigma' argument.
percentile <- c(25, 50, 75, 95) # some percentiles of interest.
## E1. Get 'mean' values.
theta <- uninormalQlink(theta = eta, percentile = percentile,
sd = sigma, inverse = TRUE) # Mu
## Not run:
## E2. Plot theta vs. eta, 'shape' fixed, for different percentiles.
plot(theta[, 1], eta, type = "l", las = 1, lty = 2, lwd = 3,
ylim = c(-10, 10), xlim = c(-10, 10),
main = "uninormalQlink(theta; shape), fixed 'shape'.",
xlab = "Theta (scale)", ylab = "uninormalQlink")
abline(v = 0, h = 0, col = "red")
lines(theta[, 2], eta, lty = 2, lwd = 3, col = "blue")
lines(theta[, 3], eta, lty = 2, lwd = 3, col = "orange")
lines(theta[, 4], eta, lty = 2, lwd = 3, col = "red")
legend("bottomright", c("25th Perc", "50th Perc", "75th Perc", "95th Perc"),
col = c("black", "blue", "orange", "red"), lty = c(2, 2, 2, 2),
lwd = rep(3, 4))
## End(Not run)
## E3. uninormalQlink() and its inverse ##
etabis <- uninormalQlink(theta = theta, percentile = percentile,
sd = sigma, inverse = FALSE)
my.diff <- eta - etabis
summary(my.diff) # Zero
Normal (distribution–specified) quantile regression
Description
Distribution–specified quantile regression.
An extension of uninormal from VGAM.
It handles effectively uninormalQlink via the first
linear predictor.
Usage
uninormalff(link1 = "identitylink", lsd = "loglink",
percentile = 50,
imethod = 1, isd = NULL, parallel = FALSE,
smallno = 1.0e-5, zero = "sd")
Arguments
link1 |
Link function for the first linear predictor.
By default
|
percentile |
Numeric. A vector with the percentiles of interest, between
0 and 100.
Used only when |
lsd, imethod, isd, parallel, smallno, zero |
Same as in
|
Details
An extension of
uninormal adapted to handle
uninormalQlink,
for normal quantile regression (QR) via the first
linear predictor.
The standard deviation only can be estimated.
The second linear predictor is fixed to
\eta_2 = \log \sigma, and
var.arg is set internally to FALSE.
Unlike usual QR where the distribution of Y|X is unspecified,
uninormalff() estimates normal distributions at different
quantiles (as entered in percentile)
of the Y|X. For this, set
link1 = uninormaQlink. To mimic
uninormal set
link1 = "identitylink" (default).
Initial developments of this work are in
Miranda & Yee (2019).
See, e.g.,
weibullRff,
for another example on distribution specified
quantile regression with the two–parameter Weibull
distribution.
Value
An object of class "vglm".
See vglm-class for full details.
Note
Q.reg must be used in
the vglm() or vgam() to enter the response.
See example below.
This VGAM family function does not handle censored data.
Author(s)
V. Miranda and Thomas W. Yee.
References
Miranda & Yee (2019) New Link Functions for Distribution–Specific Quantile Regression Based on Vector Generalized Linear and Additive Models. Journal of Probability and Statistics, Article ID 3493628.
Miranda & Yee (2021) Two–Parameter Link Functions, With Application to Negative Binomial, Weibull and Quantile Regression. In preparation.
See Also
uninormalQlink,
uninormal,
Q.reg,
weibullQlink,
weibullRff,
CommonVGAMffArguments.
Examples
## Not run:
x2 <- seq(0,10,length.out = 100) # independent variable
sig <- exp(0.5 + 0.15*x2) # non-constant variance
b_0 <- 10 # true intercept
b_1 <- 0.15 # true slope
set.seed(17221) # make the next line reproducible
e <- rnorm(100,mean = 0, sd = sig) # normal random error with non-constant variance
y <- b_0 + b_1*x2 + e # dependent variable
## Data
ndata <- data.frame(y = y, x2 = x2)
## Some percentiles of interest
percentile <- c(10, 25, 50, 90)
## Normal quantile regression, zero = NULL
fit1 <- vglm(Q.reg(y, length.arg = 4) ~ x2,
uninormalff(link1 = "uninormalQlink", percentile = percentile, zero = NULL),
data = ndata, trace = TRUE)
#summary(fit1)
( my.coef3Q <- coef(fit1, mat = TRUE) )
## Plots - percentile curves.
plot(y ~ x2, pch = 19, ylim = c(-1, 25),
main =" Normal quantile regression")
abline(h = -3:25, v = 0, col = "gray", lty = "dashed")
with(ndata, lines(x2, my.coef3Q[1, 1] + my.coef3Q[2, 1] * x2,
col = "red", lty = "dotted", lwd = 4))
with(ndata, lines(x2, my.coef3Q[1, 3] + my.coef3Q[2, 3] * x2,
col = "orange", lty = "dotted", lwd = 4))
with(ndata, lines(x2, my.coef3Q[1, 5] + my.coef3Q[2, 5] * x2,
col = "blue", lty = "dotted", lwd = 4))
with(ndata, lines(x2, my.coef3Q[1, 7] + my.coef3Q[2, 7] * x2,
col = "brown", lty = "dotted", lwd = 4))
legend("topleft", c("90th", "50th", "25th", "10th"),
col = c("brown", "blue", "orange", "red"), lty = rep("dotted", 4), lwd = rep(4, 4))
## Mimicking 'VGAM:uninormal'
fit2 <- vglm(y ~ x2, uninormalff(link1 = "identitylink", percentile = NULL, zero = NULL),
data = ndata, trace = TRUE)
## End(Not run)
Class of Vector Generalized Linear Time Series Models
Description
Time series family functions for the VGAMextra package
Objects from the Class
Objects can be created by calling
new("vgltsmff"...)
slots
Implementation of vector generalized linear time series (TS) family
functions (vgltsff) at VGAMextra is entirely based on
the structure of family functions of the class
vglmff-class.
Hence, refer to vglmff-class for a
thourugh description of slots and features involved when objects
of class "vgtsff" are being created.
Methods
Thus far, the following methods for objects of class
"vgltsff-class" are implemented:
summary-
Additional information to that displayed by the
summarymethods from VGAM. That is:a) Standard errors based on the MLEs asymptotic distributions, and
b) Checks on stationarity and/or invertibility via the polynomial roots.
Currently, summary methods at VGAMextra have been implemented for:
signature(VGAMff = "ARff"):-
For ARX–types family functions.
signature(VGAMff = "MAff"):-
For MAX–types family functions.
signature(VGAMff = "ARMAff"):-
For ARMAX–like family functions.
See
summaryS4VGAMextra
for further details.
Note
Programmers to write VGAM/VGLM time series family functions are also allowed to write methods functions either for specific purposes, or to extend those current methods to print some extra output required.
In such cases, notice that the class vgltsff-class is
labeled by an object of class "character"
(a character vector) specified at the slot
@vfamily within the family function. This is, in fact,
one of the required slots by the class
vglmff-class.
Additionally, practitioners are encouraged to mantain all previous
conventions for naming the arguments in Ts family functions
as specified at
vglmff-class, e.g.,
link is the argument for parameter link functions, etc.
Author(s)
V. Miranda and T.W. Yee.
Link functions for the mean of 2–parameter continuous distributions: The Weibull distribution.
Description
Computes the weibullMlink transformation, its inverse and
the first two derivatives.
Usage
weibullMlink(theta, shape = NULL, wrt.param = NULL,
bvalue = NULL, inverse = FALSE,
deriv = 0, short = TRUE, tag = FALSE)
Arguments
theta |
Numeric or character. This is |
shape |
The shape parameter. Required for this link to work.
See
|
wrt.param |
Positive integer, either |
bvalue, inverse, deriv, short, tag |
See |
Details
This is the link for the mean of
the 2–parameter Weibull distribution, also known as the
weibullMlink transformation.
It can only be used within
weibullRff
and is defined as
\tt{weibullMlink}(\beta; \alpha) = \eta(\beta; \alpha) =
\log [\beta \cdot \Gamma (1 + 1/\alpha)],
for given \alpha ('shape' parameter) where
\beta > 0 is the scale parameter.
weibullMlink is expressly a function of \beta, i.e.
\theta, therefore \alpha (shape)
must be entered at every call.
Numerical values of \alpha or \beta out of range may
result in Inf, -Inf, NA or NaN.
Value
For deriv = 0, the weibmeanlnik transformation of
theta, i.e., \beta, when inverse = FALSE.
If inverse = TRUE, then \theta becomes \eta,
and the inverse,
\exp\left( theta - \log \Gamma(1 + 1/ \alpha)\right),
for given \alpha, is
returned.
When deriv = 1 theta becomes
\theta = (\beta, \alpha)= (\theta_1, \theta_2), and
\eta = (\eta_1, \eta_2) with
\eta_2 = \log~\alpha,
and the argument wrt.param must be
considered:
A) If inverse = FALSE, then
d eta1 / d \beta is returned when
wrt.param = 1, and
d eta1 / d \alpha if
wrt.param = 2.
B) For inverse = TRUE, this function returns
d \beta / d eta1 and
d \alpha / d eta1 conformably arranged
in a matrix, if wrt.param = 1,
as a function of \theta_i, i = 1, 2.
When wrt.param = 2, a matrix with columns
d\beta / d eta2 and
d\alpha / d eta2
is returned.
For deriv = 2, the second derivatives in
terms of theta are likewise returned.
Note
Numerical instability may occur for values theta too close
to zero. Use argument bvalue to replace them before
computing the link.
If theta is character, then arguments inverse and
deriv are ignored. See Links
for further details.
Author(s)
V. Miranda and Thomas W. Yee.
See Also
weibullQlink,
weibullRff,
weibullR,
lgamma,
Links.
Examples
eta <- seq(-3, 3, by = 0.1) # this is eta = log(mu(b, a)).
shape <- exp(1) # 'shape' argument.
## E1. Get 'scale' values with A WARNING (not the same length)!
theta <- weibullMlink(theta = eta, shape = shape, inverse = TRUE) # Scale
## Not run:
## E2. Plot theta vs. eta, 'shape' fixed.
plot(theta, eta, type = "l", ylab = "", col = "blue",
main = paste0("weibullMlink(theta; shape = ",
round(shape, 3), ")"))
abline(h = -3:3, v = 0, col = "gray", lty = "dashed")
## End(Not run)
## E3. weibullMlink() and its inverse ##
etabis <- weibullMlink(theta = theta, shape = shape, inverse = FALSE)
summary(eta - etabis) # Should be 0
Weibull Quantile regression: Link function for the quantiles of the Weibull distribution.
Description
Computes the weibullQlink transformation, its inverse and
the first two derivatives.
Usage
weibullQlink(theta, percentile = stop("Enter percentiles."),
shape = NULL, wrt.param = NULL,
bvalue = NULL, inverse = FALSE,
deriv = 0, short = TRUE, tag = FALSE)
Arguments
theta |
Numeric or character. Same as
|
percentile |
Same as |
shape |
Numeric, positive. The shape parameter, required. |
wrt.param |
Same as in |
bvalue, inverse, deriv, short, tag |
See |
Details
The ordinary
scale–shape Weibull quantiles are directly modelled by this link,
aka weibullQlink transformation.
It can only be used within
weibullRff
as the first linear predictor, \eta_1,
and is defined as
\tt{weibullQlink}(\beta; \alpha) = \eta_1(\beta; \alpha) =
\log \{\beta \cdot [(-\log(1 - perc))^{(1/\alpha)}]\},
for given \alpha ('shape' parameter) where
\beta > 0 is the scale parameter.
weibullQlink is expressly a function of \beta, i.e.
\theta, therefore \alpha (shape)
must be entered at every call.
Numerical values of \alpha or \beta out of range may
result in Inf, -Inf, NA or NaN.
Value
For deriv = 0, the weibullQlink transformation of
theta, i.e. \beta, when inverse = FALSE.
If inverse = TRUE, then \theta becomes \eta,
and the inverse, exp[theta -
(1 / \alpha) log(-log(1 - perc))],
for given \alpha, is
returned.
When deriv = 1 theta becomes
\theta = (\beta, \alpha)= (\theta_1, \theta_2), and
\eta = (\eta_1, \eta_2) with
\eta_2 = \log~\alpha,
and the argument wrt.param must be
considered:
A) If inverse = FALSE, then
d eta1 / d \beta is returned when
wrt.param = 1, and
d eta1 / d \alpha if
wrt.param = 2.
B) For inverse = TRUE, this link returns
d \beta / d eta1 and
d \alpha / d eta1 conformably arranged
in a matrix, if wrt.param = 1,
as a function of \theta_i, i = 1, 2.
When wrt.param = 2, a matrix with columns
d\beta / d eta2 and
d\alpha / d eta2
is returned.
For deriv = 2, the second derivatives in
terms of theta are similarly returned.
Note
See weibullMlink.
Author(s)
V. Miranda and Thomas W. Yee.
References
Miranda & Yee (2021) Two–Parameter Link Functions, With Application to Negative Binomial, Weibull and Quantile Regression. In preparation.
See Also
weibullRff,
Q.reg,
weibullR,
weibmeanlink,
Links.
Examples
eta <- seq(-3, 3, by = 0.1) # this is eta = log(Weibull-quantiles).
shape <- exp(1) # 'shape' argument.
percentile <- c(25, 50, 75, 95) # some percentiles of interest.
## E1. Get 'scale' values. Gives a warning (not of the same length) !
theta <- weibullQlink(theta = eta, percentile = percentile,
shape = shape, inverse = TRUE) # Scale
## Not run:
## E2. Plot theta vs. eta, 'shape' fixed, for different percentiles.
plot(theta[, 1], eta, type = "l", lwd = 3,
ylim = c(-4, 4),
main = paste0("weibullQlink(theta; shape = ", round(shape, 3), ")"),
xlab = "Theta (scale)", ylab = "weibullQlink")
abline(h = -3:3, v = 0, col = "gray", lty = "dashed")
lines(theta[, 2], eta, lwd = 3, col = "blue")
lines(theta[, 3], eta, lwd = 3, col = "orange")
lines(theta[, 4], eta, lwd = 3, col = "red")
legend("bottomright", c("25th Perc", "50th Perc", "75th Perc", "95th Perc"),
col = c("black", "blue", "orange", "red"),
lwd = rep(3, 4))
## End(Not run)
## E3. weibullQlink() and its inverse ##
etabis <- weibullQlink(theta = theta, percentile = percentile,
shape = shape, inverse = FALSE)
summary(eta - etabis) # Should be 0 for each colum (percentile)
Distribution–specified quantile regression: 2–parameter Weibull Distribution
Description
Estimates the 2–parameter Weibull distribution by maximum likelihood.
An extension of weibullR from VGAM.
Weibull quantile regression and Weibull–mean
modelling are also handled via the first
linear predictor.
Usage
weibullRff(link1 = c("loglink", "weibullMlink", "weibullQlink")[1],
lshape = "loglink", percentile = 50,
imu = NULL, iscale = NULL, ishape = NULL,
lss = TRUE, nrfs = 1, probs.y = c(0.2, 0.5, 0.8),
imethod = 1, zero = "shape")
Arguments
link1 |
Link function for the first linear predictor.
Default
is |
percentile |
Numeric. A vector with the percentiles of interest, between 0
and 100.
Used only in Weibull quantile regression, that is,
when |
lshape, imu, iscale, ishape, lss, nrfs, probs.y, imethod |
Same as |
zero |
Specifies the parameters to be modelled as intercept–only. Further details below. |
Details
weibullRff is a modified version of
weibullR adapted to handle
weibullQlink
and weibullMlink,
two 2-parameter linear predictors that model
the Weibull mean and quantiles respectively.
The underlying density is the ordinary
scale(\beta) & shape(\alpha)
Weibull density (see weibullR).
The second linear predictor is always
\eta_2 = \log~\alpha.
The argument link1 handles the first linear predictor.
** Mimicking weibullR **
The default is link1 = "loglink", i.e.,
\eta_1 = \log~\beta = \log~scale, and
\eta_2 = \log~\alpha = \log~shape,
as with weibullR.
The mean (\mu) is returned as the fitted value.
** Weibull quantile regression **
For Weibull quantile regression set link1 = "weibullQlink"
and enter a numeric vector of percentiles of interest
via percentile.
See examples.
NOTE: Enter the response using
Q.reg. See example below.
The Weibull quantiles are returned as the fitted values.
** Weibull-mean modelling **
For Weibull-mean modelling (viz. mean time to failure)
set link1 = "weibullMlink".
The mean (\mu) is returned as the fitted value.
Value
An object of class "vglm".
See vglm-class for full details.
Note
The parameters \alpha and \beta
match the arguments shape and scale from
rweibull.
Multiple responses are handled.
This VGAM family function does not handle censored data.
Author(s)
V. Miranda and Thomas W. Yee.
References
Miranda & Yee (2021) Two–Parameter Link Functions, With Application to Negative Binomial, Weibull and Quantile Regression. In preparation.
See Also
Q.reg,
weibullQlink,
weibullMlink,
weibullR,
gamma,
CommonVGAMffArguments.
Examples
## Not run:
set.seed(18121)
nn <- 300
x2 <- sort(runif(nn, 0, 3)) # Predictor/covariate.
bb <- exp(1.1 + 0.2 * x2) # Scale parameter as function of x2.
aa <- exp(1.0 - 0.35 * x2) # Shape parameter as function of x2.
mymu <- bb * gamma(1 + 1/aa) # The Weibull mean.
## Use weibullMlink to generate appropriate scale parameter.
newbb <- weibullMlink(theta = log(mymu), shape = aa, inverse = TRUE, deriv = 0)
## A single random response
wdata <- data.frame(y1 = rweibull(nn, shape = aa, scale = newbb), x2 = x2)
# Plotting the data / Histogram
plot(y1 ~ x2, xlim = c(0, 3.1), ylim = c(-1, 35),
pch = 20, data = wdata, col = "black",
main = "Weibull Quantile regression~ x2")
abline(h = 0, v = 0, col = "grey", lty = "dashed")
with(wdata, hist(y1, col = "red", breaks = 15))
## Weibull regression - percentile = c(25, 50, 75)
## Note the use of Q.reg.
fit1 <- vglm(Q.reg(y1, length.arg = 3) ~ x2,
weibullRff(link1 = "weibullQlink", zero = NULL,
percentile = c(25, 50, 75)),
trace = TRUE, data = wdata)
head(fitted(fit1))
summary(fit1)
my.coef3Q <- coef(fit1, mat = TRUE)
### Proportion of data below the estimated 25% Quantile line.
100 * (1 - (sum(wdat$y1 >= fitted(fit2)[, 1]) / nn)) # Around 25%
### Proportion of data below the estimated 50% Quantile line.
100 * (1 - (sum(wdat$y1 >= fitted(fit2)[, 2]) / nn)) # Around 50%
### Proportion of data below the estimated 75% Quantile line.
100 * (1 - ( sum(wdat$y1 >= fitted(fit2)[, 3]) / nn )) # Around 75%
## The quantile plots ##
my.coef3Q <- coef(fit2, matrix = TRUE)
with(wdat, lines(x2, exp(my.coef3Q[1, 1] + my.coef3Q[2, 1] * x2),
col = "red", lty = "dotted", lwd = 4))
with(wdat, lines(x2, exp(my.coef3Q[1, 3] + my.coef3Q[2, 3] * x2),
col = "orange", lty = "dotted", lwd = 4))
with(wdat, lines(x2, exp(my.coef3Q[1, 5] + my.coef3Q[2, 5] * x2),
col = "blue", lty = "dotted", lwd = 4))
## Adding the 'mean' or expected Weibull regression line.
fit2 <- vglm(y1 ~ x2,
weibullRff(link1 = "weibullMlink", zero = NULL),
trace = TRUE, data= wdat)
my.coef3Q <- coef(fit2, mat = TRUE)
with(wdat, lines(x2, exp(my.coef3Q[1, 1] + my.coef3Q[2, 1] * x2),
col = "yellow", lty = "dashed", lwd = 3))
legend("topleft", c("25h Perc", "50th Perc", "Mean", "75th Perc"),
col = c("red", "orange", "cyan", "blue"),
lty = c("dashed", "dashed", "dashed", "dashed"), lwd = rep(4, 4))
## End(Not run)
Link functions for the mean of 1–parameter discrete distributions: The Yule–Simon Distribution.
Description
Computes the yulesimonMlink transformation, its inverse
and the first two derivatives.
Usage
yulesimonMlink(theta, bvalue = NULL, inverse = FALSE,
deriv = 0, short = TRUE, tag = FALSE)
Arguments
theta |
Numeric or character. This is |
bvalue, inverse, deriv, short, tag |
Details at |
Details
Assume Y \sim {\rm{Yule-Simon}}(\rho) ,
where \rho is a shape parameter as in
yulesimon.
Then, the mean of Y
is given by
\mu = \frac{\rho}{\rho - 1} = (1 - \rho^{-1})^{-1},
provided \rho > 1 .
This link function may be conceived as a natural link function
for the mean of the Yule–Simon distribution which comes up by
taking the logarithm on both sides of this equation. More precisely,
the yulesimonMlink tranformation for \rho > 1
is given by
{\tt{yulesimonMlink}}(\rho) = - \log (1 - \rho^{-1}).
While this link function can be used to model any parameter lying in
(1, \infty), it is particularly useful for event-rate data where
the mean, \mu, can be written in terms of some rate of events,
say \lambda, and the timeframe observed t.
Specifically,
\mu = \lambda t.
Assuming that additional covariates might be available to linearly
model \lambda (or \log \lambda),
this model can be treated as a VGLM with one parameter where
the time t (as \log t) can be easily incorporated
in the analysis as an offset.
Under this link function the domain set for \rho
is (1, \infty). Hence, values of \rho too
close to 1 from the right, or out of range will result
in Inf, -Inf, NA or NaN.
Use argument bvalue to adequately replace them before
computing the link function.
Unlike logffMlink or
zetaffMlink, the inverse of
this link function can be written in close form.
If theta is a character, arguments inverse and
deriv are disregarded.
Value
For deriv = 0, the yulesimonMlink transformation of
theta when inverse = FALSE, and if
inverse = TRUE then
exp(theta) / (exp(theta) - 1).
For deriv = 1, d eta / d theta
as a function of theta if inverse = FALSE, else
the reciprocal d theta / d eta.
For deriv = 2 the second order derivatives
are correspondingly returned.
Warning
Conforming with yulesimon, the domain
set for rho is (0, \infty). However, in order for
yulesimonMlink to be a real number, rho must be greater
then 1.0. Then, when a VGLM is fitted via
yulesimon using this link function,
numerical instability will occur if the estimated or the true value of
rho lies between 0 and 1, or if the initial values for rho
generated by
yulesimon fail to meet
rho > 1. Alternatively, try
posPoiMlink or
loglink if this happens.
Note
If the underlying assumption \rho > 1 is not met,
then this function returns NaN.
This is equivalent to claim that the mean is infinite or negative
and, consequently, its logarithm will not be real.
The vertical line theta = 1 is an asymptote for this link
function. As a result, it may return Inf for values of
\rho too close to 1 from the right.
Author(s)
V. Miranda and T. W. Yee
See Also
yulesimon,
Links,
posPoiMlink,
loglink.
Examples
## Example 1 ##
Shapes <- 1:10
yulesimonMlink(theta = Shapes, deriv = 1) ## d eta/d theta, as function of theta
yulesl.inv <-
# The inverse minus actual values
yulesimonMlink(theta = yulesimonMlink(theta = Shapes), inverse = TRUE) - Shapes
summary(yulesl.inv) ## zero
## Example 2. Special values of theta (rho) ##
rhos <- c(-Inf, -2, -1, 0.0, 0.5, 1, 5, 10, 100, Inf, NaN, NA)
rbind(rho = rhos,
yuleslink = yulesimonMlink(theta = rhos),
inv.yulesl =yulesimonMlink(theta = rhos, inverse = TRUE))
## Example 3 The yulesimonMlink transformation and the first two derivatives ##
rhos <- seq(1, 20, by = 0.01)[-1]
y.rhos <- yulesimonMlink(theta = rhos, deriv = 0)
der.1 <- yulesimonMlink(theta = rhos, deriv = 1)
der.2 <- yulesimonMlink(theta = rhos, deriv = 2)
plot(y.rhos ~ rhos, col = "black",
main = "log(mu), mu = E[Y], Y ~ Yule-Simon(rho).",
ylim = c(-5, 10), xlim = c(-1, 5), lty = 1, type = "l", lwd = 3)
abline(v = 1.0, col = "orange", lty = 2, lwd = 3)
abline(v = 0, h = 0, col = "gray50", lty = "dashed")
lines(rhos, der.1, col = "blue", lty = 5)
lines(rhos, der.2, col = "chocolate", lty = 4)
legend(2, 7, legend = c("yulesimonMlink", "deriv = 1", "deriv = 2"),
col = c("black", "blue", "chocolate"), lty = c(1, 5, 4))
Link functions for the mean of 1–parameter discrete distributions: The Zeta Distribution.
Description
Computes the zetaffMlink transformation, including its inverse
and the first two derivatives.
Usage
zetaffMlink(theta, bvalue = NULL,
alg.roots = c("Newton-Raphson", "bisection")[1],
inverse = FALSE, deriv = 0, short = TRUE, tag = FALSE)
Arguments
theta |
Numeric or character. This is |
bvalue |
Details at |
alg.roots |
Character. The iterative method to find the inverse of this link function. Default is Newton–Raphson. Optionally, the bisection method is also available. |
inverse, deriv, short, tag |
Details at |
Details
This is a link function for the zeta distribution,
zetaff, which emerges by applying
the logarithm transformation to its mean. Specifically,
assume Y follows a zeta distribution with shape parameter
s (this is theta in the VGLM/VGAM framework).
Then, the mean of Y is
\mu = \frac{\zeta (s)}{\zeta (s + 1)},
provided s > 1, where \zeta is the Riemann's zeta
function computed by zeta. The notation
adopted here conforms with zetaff
in terms of the density of the zeta distribution.
The zetaffMlink transformation is given by
\eta = \tt{zetaffMlink}(s) = \log \frac{\zeta (s)}{\zeta (s + 1)}.
It is particularly useful when modelling event–rate data where
the expected number of events, \mu, can be modelled as
\mu = \lambda t.
Specifically, \lambda is a standardized mean per unit–time, and
t is the observed timeframe.
The domain set for s, i.e. \theta, is
(1, \infty). Hence, either large values of
s, or those too close to 1 from the right, or out of
range will result in Inf, -Inf, NA or
NaN. Use argument bvalue to adequately replace
them before computing the link function.
WARNING: While in zetaff the parameter
s lies in (1, \infty),
zetaffMlink will be real when s > 1. Consequently,
for any VGLM fitted via zetaff using this
link function, numerical problems will take place if any
s value lies between 0.0 and 1.0 at any iteration. Use
optional link functions like loglink.
When inverse = TRUE and deriv = 0,
s changes into \eta, and therefore the domain set
(only in this case) turns into (0, \infty).
See below for further details.
If theta is a character, arguments inverse and
deriv are disregarded.
Value
For deriv = 0, the zetaffMlink transformation of
theta, if inverse = FALSE. When inverse = TRUE,
theta becomes \eta, and then the inverse of zetaffMlink
is required. However, it cannot be written in closed–form. Instead,
the inverse image of \eta, say \theta_\eta,
is returned. That is, a unique vector \theta_\eta
such that
{\tt{zetaffMlink}} (\theta_\eta) = \eta.
This process is equivalent to find the root,
\theta_\eta, of the function
{\tt{zetaffMlink}} (\theta) - \eta,
which is internally carried out via the method entered
at alg.roots. Options available are
“Newton-Raphson” and “bisection”.
For deriv = 1, d eta / d theta
as a function of theta if inverse = FALSE, else
the reciprocal d theta / d eta.
Similarly, when deriv = 2 the second order derivatives
are returned accordingly.
The first two derivatives of the Riemman's zeta function
are computed by zeta.
Besides, the zetaffMlink function as well as its derivatives
are graphically delimited for specific asymptotes.
Consequently, the mathematical limit of this link function is
returned for special values of theta, e.g.
for theta = \infty. See example 2 below.
Warning
Where the inverse image of \eta, \theta_\eta,
is required, values entered at theta (becoming \eta) must
be non-negative. The reason is that the zetaffMlink
transformation is decreasing but strictly positive in (1, \infty)
asymptotically approaching to the horizontal axis. In this way, the
shifted–down zetaffMlink function
{\tt{zetaff.func}}(\theta | \eta) =
{\tt{zetaffMlink}}(\theta) - \eta
uniquely intersects the horizontal axis and hence the inverse image
computed by “Newton-Raphson” or “bisection”
will be a real
number.
Note
Overall, this link function is useful to model any parameter
lying in (1, \infty), specially if the theoretical mean
can be written as \mu = \lambda t, as stated above.
As a result, some problems may arise if there are covariates.
Try another link function if any issue, such
as logloglink.
Author(s)
V. Miranda and T. W. Yee
See Also
zetaff,
newtonRaphson.basic,
bisection.basic,
zeta,
loglink,
Links.
Examples
## Example 1 ##
Shapes <- 1:10 + 0.1
zetaffMlink(theta = Shapes, deriv = 1) ## d eta/d theta, as function of theta
zetafflk.inv <- zetaffMlink(theta = zetaffMlink(theta = Shapes), inverse = TRUE) - Shapes
summary(zetafflk.inv) ## Should be zero
## Example 2. Special values of theta, inverse = FALSE ##
Shapes <- c(-Inf, -1, 0.5, 1, 1.5, 10, 100, Inf, NaN, NA)
print(rbind(Shapes, zetaffMlink = zetaffMlink(theta = Shapes),
inv.zfflink = zetaffMlink(theta = Shapes, inverse = TRUE)), digits = 3)
## Example 3. Plot of 'zetaffMlink()' and its first two derivatives ##
## inverse = FALSE, deriv = 0, 1, 2 ##
Shapes <- seq(1, 20, by = 0.01)[-1]
y.shapes <- zetaffMlink(theta = Shapes, deriv = 0)
der.1 <- zetaffMlink(theta = Shapes, deriv = 1)
der.2 <- zetaffMlink(theta = Shapes, deriv = 2)
plot(y.shapes ~ Shapes,
col = "black", main = "log(mu), mu = E[Y], Y ~ Zeta(s).",
ylim = c(-5, 10), xlim = c(-0.1, 5), lty = 1, type = "l", lwd = 3)
abline(v = 1.0, col = "orange", lty = 2, lwd = 3)
abline(v = 0, h = 0, col = "gray50", lty = "dashed")
lines(Shapes, der.1, col = "blue", lty = 5)
lines(Shapes, der.2, col = "chocolate", lty = 4)
legend(2, 7, legend = c("zetaffMlink", "deriv = 1", "deriv = 2"),
col = c("black", "blue", "chocolate"), lty = c(1, 5, 4), lwd = c(3, 1, 1))