The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
The grs_* functions provide an end-to-end pipeline for
computing and validating polygenic / genetic risk scores within the UK
Biobank Research Analysis Platform (RAP). Because individual-level
genotype data cannot be downloaded locally, all computationally
intensive steps are executed as cloud jobs via the DNAnexus Swiss Army
Knife app.
| Function | Where it runs | Purpose |
|---|---|---|
grs_check() |
Local or RAP | Validate and export a SNP weights file |
grs_bgen2pgen() |
Submits RAP jobs | Convert UKB imputed BGEN files to PGEN |
grs_score() |
Submits RAP jobs | Score genotypes across chromosomes with plink2 |
grs_standardize() |
Local or RAP | Z-score standardise GRS columns |
grs_zscore() |
Local or RAP | Alias for grs_standardize() |
grs_validate() |
Local or RAP | Assess predictive performance (OR/HR, AUC, C-index) |
Typical pipeline:
grs_check() -> grs_bgen2pgen() -> grs_score() -> grs_standardize() -> grs_validate()
Prerequisite: you must be authenticated to UKB RAP and have a project selected. See
vignette("auth").
grs_check()Before uploading to RAP, validate your SNP weights file. The function reads the file, checks required columns, flags problems, and writes a plink2-compatible space-delimited output.
Required columns:
| Column | Description |
|---|---|
snp |
SNP identifier (expected rs + digits format) |
effect_allele |
Effect allele: one of A / T / C / G |
beta |
Effect size (log-OR or beta); must be numeric |
library(ukbflow)
# Local: weights file on your machine
w <- grs_check("weights.csv", dest = "weights_clean.txt")
#> Read 'weights.csv': 312 rows, 5 columns.
#> ✔ No NA values.
#> ✔ No duplicate SNPs.
#> ✔ All SNP IDs match rs[0-9]+ format.
#> ✔ All effect alleles are A/T/C/G.
#> Beta summary:
#> Range : -0.412 to 0.589
#> Mean |beta|: 0.183
#> Positive : 187 (59.9%)
#> Negative : 125 (40.1%)
#> Zero : 0
#> ✔ Weights file passed checks: 312 SNPs ready for UKB RAP.
#> ✔ Saved: 'weights_clean.txt'# On RAP (RStudio) -- use /mnt/project/ paths directly
w <- grs_check(
file = "/mnt/project/weights/weights.csv",
dest = "/mnt/project/weights/weights_clean.txt"
)The validated weights are also returned invisibly for inspection.
grs_bgen2pgen()UKB imputed genotype data is stored in BGEN format on RAP.
grs_bgen2pgen() submits one Swiss Army Knife job per
chromosome to convert BGEN files to the faster PGEN format with a MAF
filter applied via plink2.
This function is called from your local machine or RAP RStudio – the heavy lifting runs entirely in the cloud.
# Test on the smallest chromosome first
ids <- grs_bgen2pgen(chr = 22, dest = "/pgen/", priority = "high")
#> Uploading 'grs_bgen2pgen_std.R' to RAP ...
#> ✔ Uploaded: '/grs_bgen2pgen_std.R'
#> Submitting 1 job(s) -- mem2_ssd1_v2_x4 / priority: high
#> ✔ chr22 -> 'job-Gxxxxxxxxxxxxxxxxxxxxxxxx'
#> ✔ 1/1 job(s) submitted. Monitor with job_ls().# Full run: small chromosomes on standard, large on upgraded instance
ids_small <- grs_bgen2pgen(chr = 17:22, dest = "/pgen/")
ids_large <- grs_bgen2pgen(chr = 1:16, dest = "/pgen/", instance = "large")
# Monitor progress (job_wait() takes one job ID at a time)
job_ls()
for (id in c(ids_small, ids_large)) job_wait(id)Instance types:
| Preset | DNAnexus instance | Cores | RAM | SSD | Recommended for |
|---|---|---|---|---|---|
"standard" |
mem2_ssd1_v2_x4 |
4 | 12 GB | 200 GB | chr 15–22 |
"large" |
mem2_ssd2_v2_x8 |
8 | 28 GB | 640 GB | chr 1–16 |
Key arguments:
| Argument | Default | Description |
|---|---|---|
chr |
1:22 |
Chromosomes to process |
dest |
— | RAP output path for PGEN files (required) |
maf |
0.01 |
MAF filter passed to plink2 --maf |
instance |
"standard" |
Instance type preset |
priority |
"low" |
Job priority ("low" or "high") |
Storage warning: chromosomes 1–16 may exceed the 200 GB SSD on
"standard"instances. Useinstance = "large"for these.
grs_score()Once PGEN files are ready, grs_score() uploads your
weights file(s) and submits one plink2 scoring job per GRS. Each job
scores all 22 chromosomes and saves a single CSV to RAP.
ids <- grs_score(
file = c(
grs_a = "weights/grs_a_weights.txt",
grs_b = "weights/grs_b_weights.txt"
),
pgen_dir = "/mnt/project/pgen",
dest = "/grs/",
priority = "high"
)
#> ── Uploading 2 weight file(s) to RAP ────────────────────────────────────────
#> Uploading 'grs_a_weights.txt' ...
#> ✔ Uploaded: '/grs_a_weights.txt'
#> Uploading 'grs_b_weights.txt' ...
#> ✔ Uploaded: '/grs_b_weights.txt'
#> Submitting 2 job(s) -- mem2_ssd1_v2_x4 / priority: high
#> ✔ grs_a -> 'job-Gxxxxxxxxxxxxxxxxxxxxxxxx'
#> ✔ grs_b -> 'job-Gyyyyyyyyyyyyyyyyyyyyyyyy'
#> ✔ 2/2 job(s) submitted. Monitor with job_ls().
job_wait(ids)When running from RAP RStudio with weights already at the project root, the upload step is skipped automatically:
# On RAP: weights already at /mnt/project/grs_a_weights.txt
ids <- grs_score(
file = c(grs_a = "/mnt/project/grs_a_weights.txt"),
pgen_dir = "/mnt/project/pgen",
dest = "/grs/"
)
#> ℹ grs_a_weights.txt already at RAP root, skipping upload.Important: the
mafargument must match the value used ingrs_bgen2pgen()so that the correct PGEN files are located.
Output per job: /grs/<GRS_name>_scores.csv with
columns IID and the GRS score named
GRS_<name>.
grs_standardize()After downloading the score CSVs from RAP (via
fetch_file()) and merging them into your analysis dataset,
Z-score standardise each GRS column so that effect estimates are
interpretable as per-SD associations.
# Auto-detect all columns containing "grs" (case-insensitive)
cohort <- grs_standardize(cohort)
#> Auto-detected 2 GRS column(s): 'GRS_a', 'GRS_b'
#> ✔ GRS_a -> GRS_a_z [mean=0.0000, sd=1.0000]
#> ✔ GRS_b -> GRS_b_z [mean=0.0000, sd=1.0000]grs_zscore() is an alias and produces an identical
result.
The original columns are preserved; _z columns are
inserted immediately after their source column.
grs_validate()grs_validate() runs four complementary validation
analyses for each GRS:
| Analysis | What it measures |
|---|---|
| Per SD | OR (logistic) or HR (Cox) per 1-SD increase in GRS |
| High vs Low | OR / HR comparing top 20% vs bottom 20% |
| Trend test | P-trend across Q1–Q4 quartiles |
| Discrimination | AUC (logistic) or C-index (Cox) with 95% CI |
res <- grs_validate(
data = cohort,
grs_cols = c("GRS_a_z", "GRS_b_z"),
outcome_col = "outcome"
)
#> ── Creating GRS groups ───────────────────────────────────────────────────────
#> ── Effect per SD (OR) ───────────────────────────────────────────────────────
#> ── High vs Low ──────────────────────────────────────────────────────────────
#> ── Trend test ───────────────────────────────────────────────────────────────
#> ── AUC ──────────────────────────────────────────────────────────────────────
#> ✔ Validation complete.
res$per_sd
res$discriminationres <- grs_validate(
data = cohort,
grs_cols = c("GRS_a_z", "GRS_b_z"),
outcome_col = "outcome",
time_col = "followup_years",
covariates = c("age", "sex", paste0("pc", 1:10))
)
res$per_sd # HR per SD x model
res$high_vs_low # HR: top 20% vs bottom 20%
res$trend # p-trend across Q1–Q4
res$discrimination # C-index with 95% CIReturn value: a named list with four
data.table elements.
| Element | Columns (logistic) | Columns (Cox) |
|---|---|---|
per_sd |
exposure, term, model,
OR, CI_lower, CI_upper,
p_value |
same with HR |
high_vs_low |
same as per_sd |
same with HR |
trend |
exposure, model, p_trend |
same |
discrimination |
GRS, AUC, CI_lower,
CI_upper |
GRS, C_index, CI_lower,
CI_upper |
AUC calculation requires the
pROCpackage. Install withinstall.packages("pROC")if needed.
library(ukbflow)
# 1. Validate weights (local)
grs_check("weights.csv", dest = "weights_clean.txt")
# 2. Convert BGEN -> PGEN on RAP (submit jobs)
ids_std <- grs_bgen2pgen(chr = 17:22, dest = "/pgen/", maf = 0.01)
ids_lrg <- grs_bgen2pgen(chr = 1:16, dest = "/pgen/", maf = 0.01, instance = "large")
for (id in c(ids_std, ids_lrg)) job_wait(id)
# 3. Score GRS on RAP (submit jobs)
score_ids <- grs_score(
file = c(grs_a = "weights_clean.txt"),
pgen_dir = "/mnt/project/pgen",
maf = 0.01, # must match grs_bgen2pgen()
dest = "/grs/"
)
job_wait(score_ids)
# 4. Download score CSV from RAP
fetch_file("/grs/GRS_a_scores.csv", dest = "GRS_a_scores.csv")
# 5. Merge into analysis dataset and standardise
# cohort: your analysis data.table with IID and phenotype columns
scores <- data.table::fread("GRS_a_scores.csv") # columns: IID, GRS_a
cohort <- scores[cohort, on = "IID"] # right-join: keep all cohort rows
cohort <- grs_standardize(cohort, grs_cols = "GRS_a")
# 6. Validate
res <- grs_validate(
data = cohort,
grs_cols = "GRS_a_z",
outcome_col = "outcome",
time_col = "followup_years",
covariates = c("age", "sex", paste0("pc", 1:10))
)
res$per_sd
res$discrimination?grs_check, ?grs_bgen2pgen,
?grs_score?grs_standardize, ?grs_validatevignette("auth") – RAP authentication and project
selectionvignette("job") – monitoring and managing RAP jobsvignette("assoc") – association analysis functions used
inside grs_validate()These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.