The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

ukbflow logo

ukbflow

RAP-Native R Workflow for UK Biobank Analysis

R-CMD-check Codecov Lifecycle License: MIT

๐Ÿ“š Documentation โ€ข ๐Ÿš€ Get Started โ€ข ๐Ÿ’ฌ Issues โ€ข ๐Ÿค Contributing

Languages: English | ็ฎ€ไฝ“ไธญๆ–‡


Overview

ukbflow provides a streamlined, RAP-native R workflow for UK Biobank analysis โ€” from phenotype extraction and disease derivation to association analysis and publication-quality figures.

UK Biobank Data Policy (2024+): Individual-level data must remain within the RAP environment. Only summary-level outputs may be downloaded locally. All ukbflow functions are designed with this constraint in mind.

library(ukbflow)

# Simulate UKB-style data locally (on RAP: replace with extract_batch() + job_wait())
data <- ops_toy(n = 5000, seed = 2026) |>
  derive_missing()

# Derive lung cancer outcome (ICD-10 C34) and follow-up time
data <- data |>
  derive_icd10(name = "lung", icd10 = "C34",
               source = c("cancer_registry", "hes")) |>
  derive_followup(name        = "lung",
                  event_col   = "lung_icd10_date",
                  baseline_col = "p53_i0",
                  censor_date  = as.Date("2022-10-31"),
                  death_col    = "p40000_i0")

# Define exposure: ever vs. never smoker
data[, smoking_ever := factor(
  ifelse(p20116_i0 == "Never", "Never", "Ever"),
  levels = c("Never", "Ever")
)]

# Cox regression: smoking โ†’ lung cancer (3-model adjustment)
res <- assoc_coxph(data,
  outcome_col  = "lung_icd10",
  time_col     = "lung_followup_years",
  exposure_col = "smoking_ever",
  covariates   = c("p21022", "p31", "p22189"))

# Forest plot
res_df <- as.data.frame(res)
plot_forest(
  data      = res_df,
  est       = res_df$HR,
  lower     = res_df$CI_lower,
  upper     = res_df$CI_upper,
  ci_column = 2L
)

Installation

# Recommended
pak::pkg_install("evanbio/ukbflow")

# or
remotes::install_github("evanbio/ukbflow")

Requirements: R โ‰ฅ 4.1 ยท dxpy (dx-toolkit, required for RAP interaction)

pip install dxpy

Core Features

Layer Key Functions Description
Connection auth_login, auth_select_project Authenticate to RAP via dx-toolkit
Data Access fetch_metadata, extract_batch, job_wait Retrieve phenotype data from UKB dataset on RAP
Data Processing decode_names, decode_values, derive_icd10, derive_followup, derive_case Harmonize multi-source records; derive analysis-ready cohort
Association Analysis assoc_coxph, assoc_logistic, assoc_subgroup Three-model adjustment; subgroup & trend analysis
Genomic Scoring grs_bgen2pgen, grs_score, grs_standardize Distributed plink2 scoring on RAP worker nodes
Visualization plot_forest, plot_tableone Publication-ready figures & tables
Utilities ops_setup, ops_toy, ops_na, ops_snapshot, ops_withdraw Environment check, synthetic data, pipeline diagnostics, and cohort management

Function Reference

Auth & Fetch
Extract & Decode
Job Monitoring
Derive โ€” Phenotypes
Derive โ€” Survival
Association Analysis
Visualisation
Utilities & Diagnostics
GRS Pipeline

Documentation

Full vignettes and function reference:

https://evanbio.github.io/ukbflow/


Contributing

Bug reports, feature requests, and pull requests are welcome. See CONTRIBUTING.md.


License

MIT License ยฉ 2026 Yibin Zhou


Made with โค๏ธ by Yibin Zhou

โฌ† Back to Top

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.