The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

๐ Documentation โข ๐ Get Started โข ๐ฌ Issues โข ๐ค Contributing
Languages: English | ็ฎไฝไธญๆ
ukbflow provides a streamlined, RAP-native R workflow for UK Biobank analysis โ from phenotype extraction and disease derivation to association analysis and publication-quality figures.
UK Biobank Data Policy (2024+): Individual-level data must remain within the RAP environment. Only summary-level outputs may be downloaded locally. All
ukbflowfunctions are designed with this constraint in mind.
library(ukbflow)
# Simulate UKB-style data locally (on RAP: replace with extract_batch() + job_wait())
data <- ops_toy(n = 5000, seed = 2026) |>
derive_missing()
# Derive lung cancer outcome (ICD-10 C34) and follow-up time
data <- data |>
derive_icd10(name = "lung", icd10 = "C34",
source = c("cancer_registry", "hes")) |>
derive_followup(name = "lung",
event_col = "lung_icd10_date",
baseline_col = "p53_i0",
censor_date = as.Date("2022-10-31"),
death_col = "p40000_i0")
# Define exposure: ever vs. never smoker
data[, smoking_ever := factor(
ifelse(p20116_i0 == "Never", "Never", "Ever"),
levels = c("Never", "Ever")
)]
# Cox regression: smoking โ lung cancer (3-model adjustment)
res <- assoc_coxph(data,
outcome_col = "lung_icd10",
time_col = "lung_followup_years",
exposure_col = "smoking_ever",
covariates = c("p21022", "p31", "p22189"))
# Forest plot
res_df <- as.data.frame(res)
plot_forest(
data = res_df,
est = res_df$HR,
lower = res_df$CI_lower,
upper = res_df$CI_upper,
ci_column = 2L
)# Recommended
pak::pkg_install("evanbio/ukbflow")
# or
remotes::install_github("evanbio/ukbflow")Requirements: R โฅ 4.1 ยท dxpy (dx-toolkit, required for RAP interaction)
pip install dxpy| Layer | Key Functions | Description |
|---|---|---|
| Connection | auth_login, auth_select_project |
Authenticate to RAP via dx-toolkit |
| Data Access | fetch_metadata, extract_batch,
job_wait |
Retrieve phenotype data from UKB dataset on RAP |
| Data Processing | decode_names, decode_values,
derive_icd10, derive_followup,
derive_case |
Harmonize multi-source records; derive analysis-ready cohort |
| Association Analysis | assoc_coxph, assoc_logistic,
assoc_subgroup |
Three-model adjustment; subgroup & trend analysis |
| Genomic Scoring | grs_bgen2pgen, grs_score,
grs_standardize |
Distributed plink2 scoring on RAP worker nodes |
| Visualization | plot_forest, plot_tableone |
Publication-ready figures & tables |
| Utilities | ops_setup, ops_toy, ops_na,
ops_snapshot, ops_withdraw |
Environment check, synthetic data, pipeline diagnostics, and cohort management |
auth_login(), auth_status(),
auth_logout(), auth_list_projects(),
auth_select_project() โ RAP authenticationfetch_ls(), fetch_tree(),
fetch_url(), fetch_file() โ RAP file
systemfetch_metadata(), fetch_field() โ UKB
metadata shortcutsextract_ls(), extract_pheno(),
extract_batch() โ phenotype extractiondecode_values() โ integer codes โ human-readable
labelsdecode_names() โ field IDs โ snake_case column
namesjob_status() โ query job status by IDjob_wait() โ block until job completes (with
timeout)job_path() โ get output path of a completed jobjob_result() โ retrieve job result objectjob_ls() โ list recent jobsderive_missing() โ handle โDo not knowโ / โPrefer not
to answerโderive_covariate() โ type conversion + summaryderive_cut() โ bin continuous variablesderive_selfreport() โ self-reported disease status +
datederive_hes() โ HES inpatient ICD-10derive_first_occurrence() โ First Occurrence
fieldsderive_cancer_registry() โ cancer registryderive_death_registry() โ death registryderive_icd10() โ combine sources (wrapper)derive_case() โ merge self-report + ICD-10derive_timing() โ prevalent vs.ย incident
classificationderive_age() โ age at eventderive_followup() โ follow-up end date and
durationassoc_coxph() / assoc_cox() โ Cox
proportional hazards (HR)assoc_logistic() / assoc_logit() โ
logistic regression (OR)assoc_linear() / assoc_lm() โ linear
regression (ฮฒ)assoc_coxph_zph() โ proportional hazards assumption
testassoc_subgroup() โ stratified analysis + interaction
LRTassoc_trend() โ dose-response trend + p_trendassoc_competing() โ Fine-Gray competing risks
(SHR)assoc_lag() โ lagged exposure sensitivity analysisplot_forest() โ forest plot (PNG / PDF / JPG / TIFF,
300 dpi)plot_tableone() โ Table 1 (DOCX / HTML / PDF /
PNG)ops_setup() โ environment health check (dx CLI, RAP
auth, R packages)ops_toy() โ generate synthetic UKB-like data for
development and testingops_na() โ summarise missing values (NA and
"") across all columnsops_snapshot() โ record pipeline checkpoints and track
dataset changesops_snapshot_cols() โ retrieve column list from a saved
snapshotops_snapshot_diff() โ compare columns between two
snapshotsops_snapshot_remove() โ remove columns added after a
given snapshotops_set_safe_cols() โ define protected columns that
ops_snapshot_remove will not dropops_withdraw() โ exclude UKB withdrawn participants
from a cohortgrs_check() โ validate SNP weights filegrs_bgen2pgen() โ convert BGEN โ PGEN on RAP (submits
cloud jobs)grs_score() โ score GRS across chromosomes with
plink2grs_standardize() / grs_zscore() โ Z-score
standardisationgrs_validate() โ OR/HR per SD, high vs low, trend,
AUC/C-indexFull vignettes and function reference:
https://evanbio.github.io/ukbflow/
Bug reports, feature requests, and pull requests are welcome. See CONTRIBUTING.md.
MIT License ยฉ 2026 Yibin Zhou
Made with โค๏ธ by Yibin Zhou
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.