The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Decoding UKB Column Names and Values

Overview

Raw UKB phenotype data contains encoded column names and values that need to be converted before analysis.

Source Column names Column values
extract_pheno() participant.p31 Raw integer codes — needs decode_values()
extract_batch() p31, p53_i0 Usually already decoded — decode_values() typically not needed

Both outputs need decode_names() to convert field ID column names to human-readable snake_case.

Call order matters: when using extract_pheno() output, always run decode_values() before decode_names(), because value decoding relies on the numeric field ID still being present in the column name.


Step 1: Decode Values

decode_values() converts raw integer codes to human-readable labels for categorical fields that have UKB encoding mappings. Continuous, date, text, and already-decoded fields are left unchanged.

df <- decode_values(df)
#> ✔ Decoded 3 categorical columns; 2 non-categorical columns unchanged.

It requires two metadata files from the UKB Showcase. Download them once with:

fetch_metadata(dest_dir = "data/metadata")

Then point decode_values() to the same directory (default matches fetch_metadata()):

df <- decode_values(df, metadata_dir = "data/metadata")

What gets decoded

Column Raw value Decoded value
p31 0 / 1 "Female" / "Male"
p54 11012 "Leeds"
p20116_i0 0 / 1 / 2 "Never" / "Previous" / "Current"

Codes absent from the encoding table (including UKB missing codes -1, -3, -7) are returned as NA.


Step 2: Decode Names

decode_names() renames columns from field ID format to snake_case labels using the approved UKB field dictionary available to your project.

df <- decode_names(df)
#> ✔ Renamed 5 columns.

Name conversion examples

Raw name Decoded name
participant.eid eid
participant.p31 sex
participant.p21022 age_at_recruitment
participant.p53_i0 date_of_attending_assessment_centre_i0
p31 sex
p53_i0 date_of_attending_assessment_centre_i0

Both extract_pheno() format (participant.p31) and extract_batch() format (p31) are handled automatically.

Long names

Some UKB field titles are verbose. Names exceeding max_nchar characters are flagged with a warning (default: 60). Lower the threshold to catch more aggressively:

df <- decode_names(df, max_nchar = 30)
#> ! 1 column name longer than 30 characters - consider renaming manually:
#> • date_of_attending_assessment_centre_i0

Rename manually to something concise:

names(df)[names(df) == "date_of_attending_assessment_centre_i0"] <- "date_baseline"

Getting Help

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.