The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

ktweedie-vignette

Introduction

ktweedie is a package that fits nonparametric Tweedie compound Poisson gamma models in the reproducing kernel Hilbert space. The package is based on two algorithms, the ktweedie for kernel-based Tweedie model and the sktweedie for sparse kernel-based Tweedie model. The ktweedie supports a wide range of kernel functions implemented in the R package kernlab and the optimization algorithm is a Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm with bisection line search. The package includes cross-validation functions for one-dimensional tuning of the kernel regularization parameter \(\lambda\) and for two-dimensional joint tuning of one kernel parameter and \(\lambda\). The sktweedie uses variable weights to achieve variable selection. It is a meta-algorithm that alternatively updates the kernel parameters and a set of variable weights.

The ktweedie solves the problem \[ \min_{\boldsymbol{\alpha}}\left\{ -\sum_{i=1}^{n}\frac{1}{\phi}\left(\frac{y_{i}e^{(1-\rho)\mathbf{K}_{i}^{\top}\boldsymbol{\alpha}}}{1-\rho}-\frac{e^{(2-\rho)\mathbf{K}_{i}^{\top}\boldsymbol{\alpha}}}{2-\rho}\right)+\lambda\boldsymbol{\alpha}^{\top}\mathbf{K}\boldsymbol{\alpha}\right\} , \]where \(\rho\in(1,2)\) is the index parameter, \(\phi>0\) is the dispersion parameter, \(\mathbf{K}\) is an \(n\times n\) kernel matrix computed according to the user-specified kernel function \(K(\cdot ,\cdot)\), whose entries are \(K_{ij}=K(\mathbf{x}_i, \mathbf{x}_j)\) are calculated based on the \(p\)-dimensional predictors from subjects \(i,j=1,\ldots,n\). In the kernel-based Tweedie model, the mean parameter \(\mu_i\) for the \(i\)-th observation is modelled by \[\mu_i=\log(\mathbb{E}(Y_i|\mathbf{x_i}))=\sum_{j=1}^n \alpha_j K(\mathbf x_{i}, \mathbf x_j).\]

The sktweedie solves\[ \begin{aligned} &\min_{\boldsymbol{\alpha}, \mathbf{w}}\left\{ -\sum_{i=1}^{n}\frac{1}{\phi}\left(\frac{y_{i}e^{(1-\rho)\mathbf{K(w)}_{i}^{\top}\boldsymbol{\alpha}}}{1-\rho}-\frac{e^{(2-\rho)\mathbf{K(w)}_{i}^{\top}\boldsymbol{\alpha}}}{2-\rho}\right)+\lambda_1\boldsymbol{\alpha}^{\top}\mathbf{K(w)}\boldsymbol{\alpha} +\lambda_2 \mathbf{1}^\top \mathbf{w} \right \}\\ & \qquad \qquad \mathrm{s.t.\ \ \ } w_j\in [0,1],\ j=1,\ldots,p, \end{aligned}\]

where \(K(\mathbf{w})_{ij}=K(\mathbf{w \odot x}_i, \mathbf{w \odot x}_j)\) involves variable weights \(\mathbf w\).

Installation

From the CRAN.

install.packages("ktweedie")

From the Github.

devtools::install_github("ly129/ktweedie")

Quick Start

First we load the ktweedie package:

library(ktweedie)

The package includes a toy data for demonstration purpose. The \(30\times5\) predictor matrix x is generated from standard normal distribution and y is generated according to\[y\sim \mathrm{Tweedie}(\mu=\sin(x\beta), \rho=1.5,\phi=0.5),\]where \(\beta=(6, -4, 0, 0, 0)\). That said, only the first two predictors are associated with the response.

data(dat)
x <- dat$x
y <- dat$y

An input matrix x and an output vector y are now loaded. The ktd_estimate() function can be used to fit a basic ktweedie model. The regularization parameter lam1 can be a vector, which will be solved in a decreasing order with warm start.

fit.ktd <- ktd_estimate(x = x,
                        y = y,
                        kern = rbfdot(sigma = 0.1),
                        lam1 = c(0.01, 0.1, 1))
str(fit.ktd$estimates)
#> List of 3
#>  $ lambda 1   :List of 3
#>   ..$ fn         : num 110
#>   ..$ coefficient: num [1:30, 1] 0.5558 -0.062 -0.0381 0.0523 -0.0251 ...
#>   ..$ convergence: int 0
#>  $ lambda 0.1 :List of 3
#>   ..$ fn         : num 51
#>   ..$ coefficient: num [1:30, 1] 1.662 -0.235 -0.177 0.867 -0.143 ...
#>   ..$ convergence: int 0
#>  $ lambda 0.01:List of 3
#>   ..$ fn         : num 39.2
#>   ..$ coefficient: num [1:30, 1] 7.692 -0.49 -0.841 4.624 -0.696 ...
#>   ..$ convergence: int 0

fit.ktd$estimates stores the estimated coefficients and the final objective function value. The estimated kernel-based model coefficients for the \(l\)-th lam1 can be accessed by the index l: fit.ktd$estimates[[l]]$coefficient.

The function can also be used to fit the sktweedie model by setting the argument sparsity to TRUE, and specifying the regularization coefficient \(\lambda_2\) in the argument lam2.

fit.sktd <- ktd_estimate(x = x,
                         y = y,
                         kern = rbfdot(sigma = 0.1),
                         lam1 = 5,
                         sparsity = TRUE,
                         lam2 = 1)

And we can access the fitted coefficients in a similar manner to the fit.ktd. Additionally, the fitted variable weights \(\mathbf w\) can be accessed by

fit.sktd$estimates[[1]]$weight
#>           [,1]
#> [1,] 1.0000000
#> [2,] 0.4462078
#> [3,] 0.0000000
#> [4,] 0.0000000
#> [5,] 0.0000000

Variables with weights close to 0 can be viewed as noise variables.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.