The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Getting Started with ivgls

Overview

Estimating causal effects from high-dimensional, network-structured exposures is a core challenge in neuroscience, genomics, and environmental science. Standard penalized regression recovers associations, not causal effects: when unmeasured confounders jointly influence both the exposures and the outcome, ordinary regression coefficients are biased. Instrumental variables (IVs) break this bias by exploiting external perturbations — such as genetic variants in brain-imaging studies — that shift exposures without directly affecting the outcome.

The ivgls package combines IV identification with graph-structured regularization. When the exposures are nodes in a known network (e.g., brain regions linked by white-matter tracts, genes linked by a pathway graph), borrowing strength across connected nodes improves support recovery. In high-dimensional genetic applications, some candidate instruments may violate the exclusion restriction through horizontal pleiotropy; ivgls corrects for this via a sisVIVE-style alternating update.

The package provides three estimators:

Estimator Graph penalty Invalid-IV correction Requires glmgraph
iv_lasso No No No
ivgl Yes No Yes
ivgl_s Yes Yes Yes

iv_lasso is a pure-glmnet two-stage LASSO baseline. ivgl adds a graph-Laplacian penalty in the second stage via glmgraph. ivgl_s further estimates and removes direct instrument-to-outcome effects.

Installation

# Step 1: install glmgraph (not on CRAN; required for ivgl and ivgl_s)
devtools::install_github("cran/glmgraph")

# Step 2: install ivgls
install.packages("ivgls")

iv_lasso() and all graph construction, simulation, and evaluation utilities work without glmgraph. Calling ivgl() or ivgl_s() without it installed produces an informative error with the install command above.

Graph construction

library(ivgls)

A <- make_graph(p = 30, type = "chain")
L <- get_laplacian(A)

cat("Number of edges:", sum(A) / 2, "\n")
#> Number of edges: 29

Generating a causal coefficient vector

generate_beta() places \(s_0\) active nodes on the graph according to one of three signal patterns. The smooth pattern places active nodes in a spatially contiguous cluster with similarly-signed effects.

set.seed(1)

bobj <- generate_beta(A, s2 = 4, signal = 2, pattern = "smooth")

cat("Active nodes:", sort(bobj$active_set), "\n")
#> Active nodes: 23 24 25 26
cat("True effects at active nodes:\n")
#> True effects at active nodes:
print(round(bobj$beta_true[bobj$active_set], 3))
#> [1] 2.037 1.833 2.319 2.066

Simulating data

generate_data() draws instruments \(Z \in \mathbb{R}^{n \times q}\), constructs exposures with latent confounding, and introduces s_alpha invalid instruments whose direct effects on \(Y\) violate the exclusion restriction.

dat <- generate_data(
  n              = 100,
  p              = 30,
  q              = 100,
  s_alpha        = 5,
  alpha_strength = 3,
  beta_true      = bobj$beta_true
)

cat("Y:", length(dat$Y),
    "| X:", nrow(dat$X), "x", ncol(dat$X),
    "| Z:", nrow(dat$Z), "x", ncol(dat$Z), "\n")
#> Y: 100 | X: 100 x 30 | Z: 100 x 100
cat("Invalid instrument indices:", which(dat$alpha_true != 0), "\n")
#> Invalid instrument indices: 1 2 3 4 5

Fitting the estimators

IV-LASSO

beta_ivl <- iv_lasso(dat$Y, dat$X, dat$Z)

cat("IV-LASSO selected nodes:", which(abs(beta_ivl) > 1e-4), "\n")
#> IV-LASSO selected nodes: 2 3 4 5 6 8 10 11 12 13 14 15 16 18 19 20 21 22 23 24 25 26 27 28
cat("IV-LASSO MCC:",
    round(get_mcc(bobj$active_set, which(abs(beta_ivl) > 1e-4), p = 30), 3), "\n")
#> IV-LASSO MCC: 0.196

IVGL and IVGL-S

ivgl() and ivgl_s() require the glmgraph package. Install it with: devtools::install_github("cran/glmgraph").

if (requireNamespace("glmgraph", quietly = TRUE)) {

  beta_ivgl <- ivgl(dat$Y, dat$X, dat$Z, L)
  fit_s     <- ivgl_s(dat$Y, dat$X, dat$Z, L, max_iter = 10)

  cat("IVGL selected nodes:", which(abs(beta_ivgl) > 1e-4), "\n")
  cat("IVGL-S selected nodes:", which(abs(fit_s$beta) > 1e-4), "\n")

  cat("IVGL MCC:",
      round(get_mcc(bobj$active_set, which(abs(beta_ivgl) > 1e-4), p = 30), 3), "\n")
  cat("IVGL-S MCC:",
      round(get_mcc(bobj$active_set, which(abs(fit_s$beta) > 1e-4), p = 30), 3), "\n")

  cat("Detected invalid instruments:", which(abs(fit_s$alpha) > 1e-4), "\n")

} else {
  message(
    "Package 'glmgraph' is not installed. ",
    "ivgl() and ivgl_s() are unavailable.\n",
    "Install with: devtools::install_github(\"cran/glmgraph\")"
  )
}
#> IVGL selected nodes: 2 3 4 5 6 16 18 20 23 24 25 26 28 
#> IVGL-S selected nodes: 23 24 25 26 28 
#> IVGL MCC: 0.449 
#> IVGL-S MCC: 0.877 
#> Detected invalid instruments: 1 2 3 4 5 6 32 37 42 57 59 60 78 79 84 97

Performance evaluation

eval_support() returns MCC, TPR, FPR, and number of selected variables.

supp_ivl <- which(abs(beta_ivl) > 1e-4)
metrics  <- eval_support(bobj$active_set, supp_ivl, p = 30)
print(round(metrics, 3))
#>      MCC      TPR      FPR Selected 
#>    0.196    1.000    0.769   24.000

Graph misspecification

corrupt_graph() randomly swaps a fraction of edges, useful for sensitivity analysis when the graph is estimated rather than known exactly.

set.seed(2)
A_corrupted <- corrupt_graph(A, corruption_rate = 0.20)

cat("Original edges:", sum(A) / 2, "\n")
#> Original edges: 29
cat("Corrupted edges:", sum(A_corrupted) / 2, "\n")
#> Corrupted edges: 29

Simulation study

run_simulation() runs \(B\) independent replicates and returns a long data frame with one row per method per replicate.

if (requireNamespace("glmgraph", quietly = TRUE)) {

  res <- run_simulation(
    B              = 5,
    n              = 100,
    p              = 30,
    q              = 100,
    graph_type     = "chain",
    signal_pattern = "smooth",
    s2             = 4,
    signal         = 2,
    s_alpha        = 5,
    alpha_strength = 3
  )

  aggregate(cbind(MCC, MSE, TPR, FPR) ~ Method, data = res, FUN = mean)

} else {
  message(
    "Package 'glmgraph' is not installed. ",
    "run_simulation() is unavailable.\n",
    "Install with: devtools::install_github(\"cran/glmgraph\")"
  )
}
#>     Method       MCC         MSE TPR       FPR
#> 1     IVGL 0.5894756 0.082105051   1 0.2461538
#> 2   IVGL_S 0.7632763 0.003185131   1 0.1153846
#> 3 IV_LASSO 0.4268386 0.119422153   1 0.4692308

Application context

The methods in this package were developed for neuroimaging-genetics studies using data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). Brain region volumes from T1-weighted MRI serve as the network-structured exposures; SNPs from genome-wide genotyping serve as candidate instruments; and MMSE score serves as the cognitive outcome. The DKT atlas defines a graph over 62 cortical and subcortical regions of interest. In that setting, ivgl_s() identifies a conservative set of eight causal ROIs robust to pleiotropic SNP effects, while iv_lasso() and ivgl() select broader sets that include confounded associations. See the companion paper for full details.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.