---
title: "Recovering a Structured Signal with Quadrupen"
subtitle: "From the Lasso to structured regularization via graph Laplacians"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{structured-signal-recovery}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 4
)
```

## Motivation

Sparse penalties such as the Lasso treat all predictors symmetrically: coefficients are shrunk independently without any notion of which predictors are related. In many applications, however, the signal has a known structure — predictors are grouped, ordered, or connected through a graph — and penalties that exploit this structure can substantially improve estimation.

This vignette builds a progressively richer family of penalties on a single simulation, showing how incorporating structural knowledge step by step leads to better coefficient recovery.

## Setup

```{r setup, message = FALSE}
library(quadrupen)
library(Matrix)
library(ggplot2)
```

### Simulation

We simulate a linear model with $p = 95$ predictors and $n = 50$ observations. The true coefficient vector $\beta$ is **piecewise constant**: it is zero over three large segments (sizes 25, 25, 25) and takes values $+1$ and $-1$ over two smaller active blocks (sizes 10 each).

```{r simulation}
set.seed(42)
p    <- 95
n    <- 50
beta <- rep(c(0, 1, 0, -1, 0), c(25, 10, 25, 10, 25))

## Block-correlated predictors: high within-segment, zero across segments
rho <- 0.75
Soo <- toeplitz(rho^(0:24))      # Toeplitz correlation within zero segments
Sww <- matrix(rho, 10, 10)       # constant correlation within active segments
Sigma <- bdiag(Soo, Sww, Soo, Sww, Soo)
diag(Sigma) <- 1

X <- as.matrix(matrix(rnorm(p * n), n, p) %*% chol(Sigma))
y <- X %*% beta + rnorm(n, 0, 10)

## Segment labels (for plot legends)
segments      <- rep(1:5, c(25, 10, 25, 10, 25))
segment_names <- paste0("seg", 1:5)
seg_labels    <- segment_names[segments]
```

The five segments are clearly visible in the true signal:

```{r plot-beta, fig.height = 3}
data.frame(index = seq_along(beta), beta = beta, segment = seg_labels) |>
  ggplot(aes(x = index, y = beta, colour = segment)) +
  geom_step() + geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.4) +
  labs(x = "Predictor index", y = expression(beta), title = "True coefficient vector") +
  theme_bw()
```

### Community graph and its Laplacian

The structural prior we wish to encode is: *predictors within the same segment should have similar coefficients*. This is captured by a **community graph** whose edges connect all pairs of predictors belonging to the same segment (a union of disjoint cliques).

```{r graph}
## Adjacency matrix: clique within each segment, no edges across
Ioo <- matrix(1, 25, 25)
Iww <- matrix(1, 10, 10)
A   <- as.matrix(bdiag(Ioo, Iww, Ioo, Iww, Ioo))
diag(A) <- 0

## Graph Laplacian: L = D - A  (regularized for positive definiteness)
L2        <- -A
diag(L2)  <- colSums(A) + 1e-2
```

The Laplacian $L_2$ defines the quadratic form

$$\beta^\top L_2\, \beta = \sum_{(i,j)\in E} (\beta_i - \beta_j)^2,$$

which penalizes *differences between connected predictors*. Adding $\lambda_2 \beta^\top L_2 \beta / 2$ to the loss therefore encourages within-segment coefficients to be equal — a perfect match for the piecewise-constant signal. Note that $L_2$ is passed as the `struct` argument to `ridge()` and `elastic_net()`.

For the Fused Lasso, the graph structure is encoded differently: `struct` takes the **adjacency matrix** $A$ directly, since the fusion penalty is

$$\lambda_2 \sum_{(i,j)\in E} |\beta_i - \beta_j|,$$

an $\ell_1$ (not $\ell_2$) penalty on pairwise differences. The graph topology is the same, but the penalty form differs.

```{r graph-viz, fig.height = 4, fig.width = 5, fig.align = "center"}
if (requireNamespace("igraph", quietly = TRUE)) {
  g  <- igraph::graph_from_adjacency_matrix(A, mode = "undirected")
  cols <- c("#E69F00", "#56B4E9", "#009E73", "#F0E442", "#CC79A7")
  vertex_colors <- cols[segments]
  igraph::plot.igraph(
    g, vertex.color = vertex_colors, vertex.size = 3,
    vertex.label = NA, edge.width = 0.1,
    main = "Community graph (colour = segment)"
  )
  legend("topleft", legend = segment_names, fill = cols, bty = "n", cex = 0.8)
}
```

---

## Lasso: a baseline without structural knowledge

The Lasso applies an element-wise $\ell_1$ penalty and treats all predictors independently. With highly correlated predictors it tends to select one representative per block arbitrarily, producing an erratic path.

```{r lasso}
fit_lasso <- lasso(X, y, intercept = FALSE)
fit_lasso
```

```{r lasso-path}
fit_lasso$plot_path(labels = seg_labels)
```

---

## Fused Lasso: penalizing differences between consecutive predictors

The Fused Lasso adds an $\ell_1$ penalty on *differences between adjacent coefficients* on top of the standard $\ell_1$ penalty:

$$\min_\beta \frac{1}{2}\|y - X\beta\|^2 + \lambda_1 \|\beta\|_1 + \lambda_2 \sum_{i=1}^{p-1} |\beta_{i+1} - \beta_i|.$$

By default, `fused_lasso()` uses a **chain graph** (consecutive pairs), which is appropriate for ordered or spatial signals.

```{r fused-chain}
fit_fused_chain <- fused_lasso(X, y, lambda2 = 5, intercept = FALSE)
fit_fused_chain
```

```{r fused-chain-path}
fit_fused_chain$plot_path(labels = seg_labels)
```

### Fused Lasso with the community graph

Replacing the chain by the community adjacency matrix $A$ fuses predictors *within the same segment* rather than consecutive ones. This directly targets the true block structure of $\beta$.

```{r fused-community}
fit_fused_comm <- fused_lasso(X, y, lambda2 = 5, struct = A, intercept = FALSE)
fit_fused_comm
```

```{r fused-community-path}
fit_fused_comm$plot_path(labels = seg_labels)
```

---

## Structured Ridge: shrinking toward block-constant solutions

The standard ridge regression shrinks all coefficients toward **zero**. The structured ridge replaces the identity by a positive semidefinite matrix $S$:

$$\min_\beta \frac{1}{2}\|y - X\beta\|^2 + \frac{\lambda_2}{2}\, \beta^\top S\, \beta.$$

With $S = L_2$ (the community Laplacian), the penalty becomes $\frac{\lambda_2}{2}\beta^\top L_2 \beta$, which shrinks *within-segment differences* rather than the coefficients themselves.

### Standard ridge

```{r ridge-std}
fit_ridge_std <- ridge(X, y, intercept = FALSE)
fit_ridge_std$plot_path(labels = seg_labels)
```

The path is a blur: all predictors are shrunk toward zero with no differentiation between segments.

### Structured ridge ($S = L_2$)

```{r ridge-struct}
fit_ridge_struct <- ridge(X, y, struct = L2, lambda_max = 1000, intercept = FALSE)
fit_ridge_struct$plot_path(labels = seg_labels)
```

The Laplacian prior forces coefficients within the same segment to track each other. The two active segments (2 and 4) now emerge as coherent bundles.

---

## Structured Elastic-net: sparsity + graph-Laplacian regularization

The Elastic-net combines an $\ell_1$ penalty with a quadratic penalty:

$$\min_\beta \frac{1}{2}\|y - X\beta\|^2 + \lambda_1\|\beta\|_1 + \frac{\lambda_2}{2}\, \beta^\top S\, \beta.$$

Setting $S = L_2$ yields the **structured Elastic-net**: the $\ell_1$ term drives entire segments to zero (sparsity at the segment level), while the Laplacian term forces the non-zero segments to be internally smooth.

```{r enet-struct}
fit_enet_struct <- elastic_net(X, y, lambda2 = 5, struct = L2, intercept = FALSE)
fit_enet_struct
```

```{r enet-struct-path}
fit_enet_struct$plot_path(labels = seg_labels)
```

The two active segments now enter the path as well-separated, near-constant bundles while the three zero segments stay at zero.

### Debiased structured Elastic-net

The $\ell_1$ penalty introduces shrinkage bias. Refitting the model on the selected support without penalization recovers near-unbiased estimates.

```{r enet-debias}
fit_enet_struct$debias <- TRUE
fit_enet_struct$plot_path(labels = seg_labels)
fit_enet_struct$debias <- FALSE
```

---

## Coefficient recovery: a side-by-side comparison

We extract the BIC-selected coefficient vector for each method and compare visually against the true $\beta$.

```{r recovery}
methods <- list(
  "Lasso"                  = fit_lasso,
  "Fused Lasso (chain)"    = fit_fused_chain,
  "Fused Lasso (community)"= fit_fused_comm,
  "Ridge (standard)"       = fit_ridge_std,
  "Ridge (structured)"     = fit_ridge_struct,
  "Elastic-net (structured)"= fit_enet_struct
)

extract_coef <- function(fit) {
  b <- fit$get_model("BIC")
  if ("Intercept" %in% names(b)) b <- b[-1]   # drop intercept
  b
}

df_coef <- do.call(rbind, lapply(names(methods), function(nm) {
  b <- extract_coef(methods[[nm]])
  data.frame(
    method  = factor(nm, levels = names(methods)),
    index   = seq_along(b),
    value   = as.numeric(b),
    segment = seg_labels
  )
}))

df_true <- data.frame(
  index   = seq_along(beta),
  beta    = beta,
  segment = seg_labels
)
```

```{r recovery-plot, fig.height = 9, fig.width = 7}
## Background shading for each segment
seg_bounds <- data.frame(
  xmin = c(1, 26, 36, 61, 71),
  xmax = c(25, 35, 60, 70, 95),
  fill = factor(1:5)
)

ggplot(df_coef, aes(x = index, y = value)) +
  geom_rect(data = seg_bounds,
            aes(xmin = xmin, xmax = xmax, ymin = -Inf, ymax = Inf, fill = fill),
            alpha = 0.08, inherit.aes = FALSE) +
  scale_fill_manual(values = c("#E69F00","#56B4E9","#009E73","#F0E442","#CC79A7"),
                    guide = "none") +
  geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.3) +
  geom_step(data = df_true, aes(y = beta), colour = "black", linewidth = 0.8,
            linetype = "dashed") +
  geom_point(aes(colour = segment), size = 0.8, alpha = 0.7) +
  scale_colour_manual(values = c("#E69F00","#56B4E9","#009E73","#F0E442","#CC79A7")) +
  facet_wrap(~ method, ncol = 2) +
  labs(x = "Predictor index", y = "Estimated coefficient",
       title = "Coefficient recovery by method",
       subtitle = "Dashed line: true β",
       colour = "Segment") +
  theme_bw() +
  theme(legend.position = "bottom")
```

The progression illustrates the benefit of structural priors:

- **Lasso**: selects a few predictors erratically within the active segments.
- **Fused Lasso (chain)**: smoother path, but the chain graph blurs transitions between segments rather than reinforcing block structure.
- **Fused Lasso (community)**: within-segment fusion correctly pools the active blocks, but the $\ell_1$ fusion penalty may still miss some predictors.
- **Ridge (standard)**: all coefficients shrunk toward zero; the block pattern is not recovered.
- **Ridge (structured)**: within-segment coefficients cluster together; the active segments are identifiable but not zeroed out.
- **Elastic-net (structured)**: sparsity zeros out the inactive segments while the Laplacian smooths the active ones — the closest recovery to the true signal.
