The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

FrailtyCompRisk

library(FrailtyCompRisk)

Package FrailtyCompRisk

Overview

The package FrailtyCompRisk is originally designed for statisticians and epidemiologists in order to analyze time-to-event data where individuals are nested within centers (e.g., hospitals or clinics), and where multiple causes of failure may occur.

The package is build upon the random-effects model for the subdistribution hazard presented in “Analysing multicentre competing risks data with a mixed proportional hazards model for the subdistribution” [Katsahian S, Resche-Rigon M, Chevret S, Porcher R.].

The Model:

Consider \(N\) observations of a time-to-event variable \(T\), a cause of event \(\epsilon\), and a covariate \(Z\): \((T_i, \epsilon_i, Z_i)_{i=1,\dots,N}\). Let \(\epsilon = 1\) denote the cause of failure of interest.

The cumulative incidence function (CIF) for cause 1 is defined as: \[ F_1(t)=P(T \le t,\epsilon = 1) \] Fine and Gray proposed a proportional hazards model for the subdistribution hazard associated with \(F_1(t)\), defined as: \[ \alpha_1(t) = \frac{\frac{dF_1(t)}{dt}}{1 - F_1(t)} \] The subdistribution hazard for individual \(i\) is modeled as:

\[ \alpha_{1}(t | Z_i, u) = \alpha_{1,0}(t) e^{(Z_i^\top \beta)} \] where:

This model assumes that all individuals are independent. To account for clustering (e.g., patients within centers), a random effect is added:

Let \(k(i) \in {1, \dots, K}\) be the cluster (center) to which subject \(i\) belongs. Then the subdistribution hazard becomes:

\[ \alpha_{1,k}(t | Z_i, u) = \alpha_{1,0}(t) e^{(Z_i^\top \beta \ + \ u_{k(i)})} \] where \(u_k \sim \mathcal{N}(0, \theta)\) are independent Gaussian random effects for each cluster \(k\), with variance \(\theta\).

Method

The package uses a restricted maximum likelihood (REML) or maximum likelihood (ML) approach to estimate the parameters \(\beta\), \(\theta\), and \(u_k\). The estimation relies on a Newton-Raphson optimization routine applied to the (restricted) log-likelihood.

Features

The package provides several main functions, \(\\\) Reml_CompRisk_frailty() Estimates \(\beta\), \(\theta\), and \(u_k\) under the subdistribution hazard model with random effects. Also returns the \(p\)-value for testing \(H_0: \theta = 0\).

Reml_CompRisk_frailty() can take 5 arguments:

  1. data (mandatory), a data frame with the following columns:

    • times : Observed times to event or censoring.

    • status : Event type: 0 = censored, 1 = cause of interest, >1 = other competing risks.

    • clusters : Cluster membership (e.g., centers or subjects).

    • covariates : (optional) Columns from the 4th onward are used as covariates.

  2. cluster_censoring (optional): Logical. If TRUE, accounts for center-specific censoring using a frailty model on censoring times. Default is FALSE.

  3. max_iter (optional): Maximum number of iterations for the Newton-Raphson algorithm (default = 300).

  4. tol (optional): Convergence tolerance for the likelihood and frailty variance (default = 1e-6).

  5. threshold (optional): Lower bound for the frailty variance parameter \(\theta\). If the estimated value falls below this threshold, frailty is considered negligible (default = 1e-5).

\(\\\)

Ml_CompRisk() Estimates \(\beta\) for the subdistribution hazard model without random effects.

Ml_CompRisk() can take 3 arguments:

  1. data (mandatory), a data frame with the following columns:

    • times : Observed times to event or censoring.

    • status : Event type: 0 = censored, 1 = cause of interest, >1 = other competing risks.

    • covariates : (optional) Columns from the 3th onward are used as covariates.

  2. max_iter (optional): Maximum number of iterations for the Newton-Raphson algorithm (default = 100).

  3. tol (optional): Convergence tolerance for the likelihood and frailty variance (default = 1e-6).

\(\\\)

Reml_Cox_frailty() Estimates \(\beta\), \(\theta\), and \(u_k\) under a Cox model with random effects, and provides a \(p_{value}\) for testing \(\theta = 0\).

Reml_Cox_frailty() can take 4 arguments:

  1. data (mandatory), a data frame with the following columns:

    • times : Observed times to event or censoring.

    • status : Event type: 0 = censored, 1 = cause of interest

    • clusters : Cluster membership (e.g., centers or subjects).

    • covariates : (optional) Columns from the 4th onward are used as covariates.

  2. max_iter (optional): Maximum number of iterations for the Newton-Raphson algorithm (default = 300).

  3. tol (optional): Convergence tolerance for the likelihood and frailty variance (default = 1e-6).

  4. threshold (optional): Lower bound for the frailty variance parameter \(\theta\). If the estimated value falls below this threshold, frailty is considered negligible (default = 1e-5).

\(\\\)

Ml_Cox() Estimates \(\beta\) under a standard Cox proportional hazards model.

Ml_Cox() can take 3 arguments:

  1. data (mandatory), a data frame with the following columns:

    • times : Observed times to event or censoring.

    • status : Event type: 0 = censored, 1 = cause of interest

    • covariates : (optional) Columns from the 3th onward are used as covariates.

  2. max_iter (optional): Maximum number of iterations for the Newton-Raphson algorithm (default = 100).

  3. tol (optional): Convergence tolerance for the likelihood and frailty variance (default = 1e-6).

Parameters_estimation() Encapsulates the four previous methods into one unified function.

Parameters_estimation() can take 6 arguments:

  1. data (mandatory), a data frame with the following columns:

    • times : Observed times to event or censoring.

    • status : Event type: 0 = censored, 1 = cause of interest, >1 = other competing risks.

    • clusters : Cluster membership (e.g., centers or subjects).

    • covariates : (optional) Columns from the 4th onward are used as covariates.

  2. method (optional):

    • “CompRisk_frailty” (default): Competing risks model for cause 1 with shared frailty.
    • “CompRisk”: Competing risks model for cause 1 without frailty.
    • “Cox_frailty”: Cox proportional hazards model with shared frailty.
    • “Cox”: Standard Cox proportional hazards model.
  3. cluster_censoring (optional): Logical. If TRUE, accounts for center-specific censoring using a frailty model on censoring times. Default is FALSE.

  4. max_iter (optional): Maximum number of iterations for the Newton-Raphson algorithm (default = 100 or 300, depends on the method chosen).

  5. tol (optional): Convergence tolerance for the likelihood and frailty variance (default = 1e-6).

  6. threshold (optional): Lower bound for the frailty variance parameter \(\theta\). If the estimated value falls below this threshold, frailty is considered negligible (default = 1e-5).

\(\\\)

simulate_data() Simulates clustered time-to-event data with competing risks.

simulate_data() can take 8 arguments:

  1. G : a vector of group or cluster identifiers (length N, where N is the size sample). Each value indicates which cluster the individual belongs to.

  2. Z : a matrix of covariates (dimensions N x p), where p is the size of the vector describing an individual). Can be set to Matrix(0,0,N) if no covariates are used.

  3. prop : proportion of individuals susceptible to cause 1 (default: 0.6).

  4. beta : a numeric vector of regression coefficients, one per covariate (p).

  5. theta : variance of the shared frailty term for event times (cause 1).

  6. cens : logical, indicating whether censoring should be simulated (default: FALSE).

  7. pcens : target censoring proportion (default: 0.25).

  8. tau : variance of the shared frailty term for censoring times (default: 0). \(\\\)

check_data_format() Verifies input data structure and formatting.

check_data_format() can only take 1 argument, a data frame. \(\\\)

Examples

set.seed(123)

n_cov = 2
n_per_cluster = 15
n_cluster = 20
n = n_cluster * n_per_cluster

G = rep(1:n_cluster, each = n_per_cluster)
Z = matrix(rnorm(n*n_cov,0,1),ncol = n_cov)

df = simulate_data(G,Z,prop = 0.6,beta = c(1,1.2),theta = 0.6,cens = TRUE)

 #Estimate using REML with frailty

res <- Reml_CompRisk_frailty(df)

 #Print estimated coefficients

res$beta
#>        V1        V2 
#> 0.8818009 1.0918708
res$theta
#> [1] 0.6087043

Computational Complexity

The algorithm involves computing weighted matrices and solving penalized systems in each iteration. For \(N\) individuals, \(K\) clusters and \(p\) covariables describing each individual:

In total, the complexity of the algorithm is ~\(\mathcal{O}(N^2 + Np + NK + (p + K)^3)\) in the worst case.

However, the implementation uses sparse matrices (Matrix package) to significantly reduce practical computational cost.

References

The Matrix package

Katsahian S, Resche-Rigon M, Chevret S, Porcher R. (2006). Analysing multicentre competing risks data with a mixed proportional hazards model for the subdistribution. Statistics in Medicine, 25(26), 4267–4278.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.