The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
The assoc_* functions fit regression models for each
exposure variable and return tidy result tables suitable for downstream
forest plots and publication tables.
| Function | Alias | Model | Effect measure |
|---|---|---|---|
assoc_coxph() |
assoc_cox() |
Cox proportional hazards | HR |
assoc_logistic() |
assoc_logit() |
Logistic regression | OR |
assoc_linear() |
assoc_lm() |
Linear regression | beta |
assoc_coxph_zph() |
assoc_zph() |
Schoenfeld residual PH test | chisq / p |
assoc_subgroup() |
assoc_sub() |
Stratified analysis + LRT interaction | HR / OR / beta |
assoc_trend() |
assoc_tr() |
Dose-response trend | HR / OR / beta + p_trend |
assoc_competing() |
assoc_fg() |
Fine-Gray competing risks | SHR |
assoc_lag() |
— | Cox lag sensitivity analysis | HR |
Prerequisite: the analysis dataset should already contain derived case status, follow-up time, and covariates produced by the
derive_*functions. Seevignette("derive")andvignette("derive-survival").
library(ukbflow)
# ops_toy(scenario = "association") returns a pre-derived analysis-ready table:
# covariates already as factors, bmi_cat / tdi_cat binned, and two outcomes
# (dm_* and htn_*) with status / date / timing / followup columns in place.
dt <- ops_toy(scenario = "association")
dt <- dt[dm_timing != 1L] # incident analysis: exclude prevalent DM casesAll main functions automatically produce up to three adjustment levels without requiring manual formula construction:
| Model | Covariates | When included |
|---|---|---|
| Unadjusted | None (crude) | Always (when base = TRUE) |
| Age and sex adjusted | Age (field 21022) + sex (field 31), auto-detected | When both columns are found |
| Fully adjusted | User-supplied covariates |
When covariates is non-NULL |
Age and sex columns are located automatically by scanning the
dataset’s column names for standard UKB naming patterns
(p21022_* for age at recruitment, p31_* for
sex). As a fallback, any column starting with "age" or
named "sex" is also recognised. No prerequisite pipeline
step is required — the detection works on any data.frame whose columns
follow UKB or ukbflow naming conventions. Set base = FALSE
to skip the first two models and run only the Fully adjusted model.
assoc_coxph() is the primary function for time-to-event
outcomes. It accepts logical (TRUE/FALSE) or
integer (0/1) event indicators and returns one
row per exposure x term x model combination.
# Crude + age-sex adjusted (automatic); p21022 and p31 auto-detected
res <- assoc_coxph(
data = dt,
outcome_col = "dm_status",
time_col = "dm_followup_years",
exposure_col = c("p20116_i0", "bmi_cat")
)# Add a Fully adjusted model
res <- assoc_coxph(
data = dt,
outcome_col = "dm_status",
time_col = "dm_followup_years",
exposure_col = "p20116_i0",
covariates = c("bmi_cat", "tdi_cat", "p1558_i0",
paste0("p22009_a", 1:10))
)Output columns:
| Column | Description |
|---|---|
exposure |
Exposure variable name |
term |
Coefficient name from coxph |
model |
Ordered factor: Unadjusted < Age and sex adjusted < Fully adjusted |
n |
Participants in model (after NA removal) |
n_events |
Events in model |
person_years |
Total person-years (rounded) |
HR |
Hazard ratio |
n,n_events, andperson_yearsall reflect the model-specific complete-case analysis set — participants with any missing value across outcome, time, exposure, or covariates are dropped before fitting. These counts therefore differ between models (e.g. Fully adjusted typically has fewer participants than Unadjusted). |CI_lower,CI_upper| Confidence interval bounds | |p_value| Wald test p-value | |HR_label| Formatted string, e.g."1.43 (1.28-1.60)"|
assoc_logistic() is for binary outcomes without a time
dimension (e.g. case-control or cross-sectional designs).
res <- assoc_logistic(
data = dt,
outcome_col = "dm_status",
exposure_col = c("p20116_i0", "bmi_cat"),
covariates = c("tdi_cat", "p1558_i0", paste0("p22009_a", 1:10))
)For sparse data or small samples, use profile likelihood confidence intervals:
res <- assoc_logistic(
data = dt,
outcome_col = "dm_status",
exposure_col = "grs_bmi",
ci_method = "profile" # slower but more accurate for sparse data
)Output is identical in structure to assoc_coxph() but
with OR and OR_label in place of
HR / HR_label, and n_cases
instead of n_events / person_years.
assoc_linear() is for continuous outcomes
(e.g. biomarker levels, BMI). The standard error of beta is included to
support downstream meta-analysis.
# Continuous outcome (BMI); smoking and GRS as exposures
res <- assoc_linear(
data = dt,
outcome_col = "p21001_i0",
exposure_col = c("p20116_i0", "grs_bmi"),
covariates = c("tdi_cat", "p1558_i0", paste0("p22009_a", 1:10))
)Common mistake: passing a binary (0/1) or logical column as
outcome_colis permitted (linear probability model) but triggers a warning. Useassoc_logistic()for binary outcomes — linear regression on a 0/1 outcome produces unbounded predictions and is rarely appropriate in epidemiological analyses.
assoc_coxph_zph() re-fits the same Cox models and tests
the PH assumption via Schoenfeld residuals (cox.zph()). Use
it alongside assoc_coxph() to validate model
assumptions.
zph <- assoc_coxph_zph(
data = dt,
outcome_col = "dm_status",
time_col = "dm_followup_years",
exposure_col = c("p20116_i0", "bmi_cat"),
covariates = c("tdi_cat", "p1558_i0")
)
# Identify any violations
zph[ph_satisfied == FALSE]Output columns include chisq, df,
p_value, ph_satisfied (logical), and global
test statistics (global_chisq, global_df,
global_p).
When the PH assumption is violated, consider adding
strata() to assoc_coxph() or modelling
time-varying effects.
assoc_subgroup() stratifies the dataset by a grouping
variable and fits the specified model within each subgroup. A likelihood
ratio test (LRT) for the exposure x subgroup interaction is computed on
the full dataset and appended as p_interaction.
# Subgroup by sex; p31 is automatically excluded from within-stratum models
res <- assoc_subgroup(
data = dt,
outcome_col = "dm_status",
time_col = "dm_followup_years",
exposure_col = "p20116_i0",
by = "p31",
method = "coxph",
covariates = c("p21022", "bmi_cat", "tdi_cat")
)Output columns include subgroup,
subgroup_level, and p_interaction in addition
to the standard effect estimate columns.
Note: the
byvariable is automatically excluded from the subgroup models. Do not include it incovariates— a collinearity warning is issued if you do. Subgroup analysis is most interpretable whenbyhas a small number of levels (e.g. sex, smoking status); continuous variables are technically permitted but per-unique-value subgrouping is rarely meaningful in practice — use a pre-categorised version (e.g. viaderive_cut()) instead.
assoc_trend() fits categorical and trend models
simultaneously for each ordered-factor exposure, returning per-category
estimates alongside a p-value for linear trend.
# assoc_trend() requires an ordered factor; make bmi_cat ordered in-place
dt[, bmi_cat := factor(bmi_cat, levels = levels(bmi_cat), ordered = TRUE)]res <- assoc_trend(
data = dt,
outcome_col = "dm_status",
time_col = "dm_followup_years",
exposure_col = "bmi_cat",
method = "coxph",
covariates = c("p21022", "p31", "tdi_cat", "p20116_i0")
)Supply custom scores reflecting approximate median BMI per category:
res <- assoc_trend(
data = dt,
outcome_col = "dm_status",
time_col = "dm_followup_years",
exposure_col = "bmi_cat",
method = "coxph",
covariates = c("p21022", "p31", "tdi_cat", "p20116_i0"),
scores = c(17, 22, 27, 35) # approximate median BMI per category
)The output contains a reference row (HR = 1.00 (ref))
followed by non-reference rows, with HR_per_score,
HR_per_score_label, and p_trend appended as
shared columns. These trend columns carry the same value across every
row within the same exposure × model combination — including the
reference row — so the full result table remains self-contained and easy
to filter or export.
assoc_competing() fits a Fine-Gray subdistribution
hazard model via survival::finegray() +
inverse-probability-of-censoring-weighted Cox regression. Use this when
a competing event (e.g. death) can preclude the outcome of interest.
Two input modes are supported:
Mode A — a single column encodes all event types:
# Construct a combined event column: 0 = censored, 1 = DM, 2 = HTN (competing)
dt_cr <- dt[htn_timing != 1L] # also exclude prevalent HTN
dt_cr[, event_type := data.table::fcase(
dm_timing == 2L, 1L,
htn_timing == 2L, 2L,
default = 0L
)]
res <- assoc_competing(
data = dt_cr,
outcome_col = "event_type", # 0 = censored, 1 = DM, 2 = HTN
time_col = "dm_followup_years",
exposure_col = "p20116_i0",
event_val = 1L,
compete_val = 2L,
covariates = c("bmi_cat", "tdi_cat")
)Mode B — separate 0/1 columns for primary and competing events:
res <- assoc_competing(
data = dt_cr,
outcome_col = "dm_status", # primary event
time_col = "dm_followup_years",
exposure_col = c("p20116_i0", "bmi_cat"),
compete_col = "htn_status", # competing event
covariates = c("tdi_cat", "p1558_i0")
)Output uses SHR (subdistribution hazard ratio) and
SHR_label in place of HR, and adds n_compete
(number of competing events in the analysis set).
All functions return a data.table that feeds directly
into plot_forest():
res <- assoc_coxph(
data = dt,
outcome_col = "dm_status",
time_col = "dm_followup_years",
exposure_col = "p20116_i0",
covariates = c("bmi_cat", "tdi_cat", "p1558_i0")
)
# Pass directly to plot_forest()
plot_forest(as.data.frame(res))
# Filter to a single model
res[model == "Fully adjusted"]
# Export
data.table::fwrite(res, "assoc_results.csv")assoc_lag() re-runs the same Cox models at one or more
lag periods to assess whether associations are driven by early events
(reverse causation or detection bias). For each lag, participants whose
follow-up time is less than lag_years are excluded;
follow-up is kept on its original scale.
res <- assoc_lag(
data = dt,
outcome_col = "dm_status",
time_col = "dm_followup_years",
exposure_col = "p20116_i0",
lag_years = c(0, 1, 2), # 0 = full cohort reference
covariates = c("bmi_cat", "tdi_cat", "p1558_i0")
)Setting lag_years = 0 (or including 0 in
the vector) runs the model on the full unfiltered cohort, providing a
reference row against which lagged results can be compared.
Output adds two columns to the standard assoc_coxph()
result:
| Column | Description |
|---|---|
lag_years |
Lag period applied (numeric, same units as
time_col) |
n_excluded |
Participants excluded because follow-up <
lag_years |
?assoc_coxph, ?assoc_logistic,
?assoc_linear?assoc_coxph_zph, ?assoc_subgroup,
?assoc_trend, ?assoc_competing,
?assoc_lagvignette("derive-survival") — follow-up time and event
derivationvignette("derive") — disease phenotype derivationThese binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.