The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Introduction to the UKB.COVID19 Package

Longfei Wang

2024-07-22

library(here)
#> here() starts at /tmp/RtmpinvnHJ/Rbuild6e352bf5a66b/UKB.COVID19
library(tidyverse)
#> ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
#> ✔ ggplot2 3.3.5     ✔ purrr   1.0.2
#> ✔ tibble  3.1.3     ✔ dplyr   1.0.7
#> ✔ tidyr   1.1.3     ✔ stringr 1.4.0
#> ✔ readr   2.0.0     ✔ forcats 0.5.1
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()
library(questionr)
library(data.table)
#> 
#> Attaching package: 'data.table'
#> The following objects are masked from 'package:dplyr':
#> 
#>     between, first, last
#> The following object is masked from 'package:purrr':
#> 
#>     transpose

Introduction

UKB.COVID19 is an R package designed to process and analyze COVID-19 data from the UK Biobank (UKBB). It provides tools to summarize COVID-19 test results, perform association tests between COVID-19 outcomes and potential risk factors, and generate input files for genome-wide association studies (GWAS).

Installation

To install the UKB.COVID19 package, you can use the following commands:

# Install from CRAN
# install.packages("UKB.COVID19")

# Load the package
library(UKB.COVID19)

Data Preparation

Before using the package, ensure you have access to the UKBB COVID-19 data. You will need to download the relevant datasets and have them ready for analysis. Due to the restriction of using UKBB data, we illustrate the use cases using simulated data, which are located in the package UKB.COVID19/inst/extdata/ and can be retrieved with function covid_example.

ukb.tab.file <- covid_example("sim_ukb.tab.gz")
ukb.tab <- fread((ukb.tab.file))
head(ukb.tab)
#>    f.eid f.31.0.0 f.34.0.0 f.189.0.0 f.22000.0.0 f.22001.0.0 f.22006.0.0
#> 1:     1        1     1946   5.43719          68           1           1
#> 2:    10        1     1942   1.40624          24           1           1
#> 3:   100        1     1941   4.63744          42           1           1
#> 4:  1000        1     1952  -2.77915          39           1          NA
#> 5:   101        0     1964  -0.37771          67           0          NA
#> 6:   102        1     1950  -3.96542          87           1           1
#>    f.22009.0.1 f.22009.0.2 f.22009.0.3 f.22009.0.4 f.22009.0.5 f.22009.0.6
#> 1:    -13.9392     4.85521    -4.42933     4.76797    -5.27259   -0.269446
#> 2:    -12.1606     5.62572     2.45384     1.40648     1.50922    0.107995
#> 3:    -10.5560     3.36640    -2.12175     1.03620    -3.17226    0.127207
#> 4:     -9.9151    -2.97417     3.31737     3.63140    -2.30812   -1.860270
#> 5:    398.7770    74.77010    -7.22445     8.38355    -2.87911   -1.455690
#> 6:    -12.3537     2.15466    -7.18555     1.01765    -6.70759    1.308840
#>    f.22009.0.7 f.22009.0.8 f.22009.0.9 f.22009.0.10 f.22009.0.11 f.22009.0.12
#> 1:  -0.0337125   -1.944710   -2.022410     4.264140      3.39439    -0.358951
#> 2:   2.6235900    0.635440    2.732790    -0.953065     -2.35587     2.528090
#> 3:   0.1696250   -1.246840   -0.162799     1.322950      4.31400    -1.037140
#> 4:   1.7189500   -0.517634   -1.981150     1.171880     -3.98168     1.648390
#> 5:  -3.5668300   -0.135274   -1.162200     1.668700     -3.70156    -4.097650
#> 6:   0.7505630   -1.550410    3.116950    -0.228453     -2.84932     3.451730
#>    f.22009.0.13 f.22009.0.14 f.22009.0.15 f.22009.0.16 f.22009.0.17
#> 1:   -0.7642600     0.309769   -0.6725160    -7.448340    3.0069300
#> 2:   -3.3455400     0.215570   -0.9274520     1.873160    1.1482700
#> 3:   -1.5557500    -0.224190   -0.0482866    -1.543140   -2.0899200
#> 4:   -0.0366804    -2.211790    0.4448880    -0.741873   -3.5272100
#> 5:   -2.7702100     0.916237   -1.1581300    -0.435394    3.5632800
#> 6:    2.2152100     1.830150   -0.1968530     4.681340   -0.0273429
#>    f.22009.0.18 f.22009.0.19 f.22009.0.20 f.22019.0.0 f.22021.0.0 f.22027.0.0
#> 1:    -2.417060   -2.6010700   -0.0485761          NA           1          NA
#> 2:     1.792240   -0.0429964   -4.3764600          NA           0          NA
#> 3:    -2.102530    1.4131500    0.5887800          NA           0          NA
#> 4:     2.813790    1.6533800    2.8949100          NA           1          NA
#> 5:    -0.658508   -0.3008150   -1.0726500          NA           0          NA
#> 6:     4.608150    0.6033540   -0.6772270          NA           0          NA
#>    f.22028.0.0 f.22029.0.0 f.22030.0.0 f.21001.0.0 f.21001.1.0 f.21001.2.0
#> 1:           1           1           1     39.0947          NA          NA
#> 2:           1           1           1     26.9972          NA          NA
#> 3:           1           1           1     39.1328          NA          NA
#> 4:           1           1           1     26.1228          NA          NA
#> 5:           1           1           1     33.1313          NA          NA
#> 6:           1           1           1     26.4349          NA          NA
#>    f.21001.3.0 f.21000.0.0 f.21000.1.0 f.21000.2.0 f.20161.0.0 f.20161.1.0
#> 1:          NA        1001          NA          NA          NA          NA
#> 2:          NA        1001          NA          NA          NA          NA
#> 3:          NA        1001          NA          NA          NA          NA
#> 4:          NA        1001          NA          NA          NA          NA
#> 5:          NA        4001          NA          NA          NA          NA
#> 6:          NA        1001          NA          NA          NA          NA
#>    f.20161.2.0 f.20161.3.0
#> 1:          NA          NA
#> 2:          NA          NA
#> 3:          NA          NA
#> 4:          NA          NA
#> 5:          NA          NA
#> 6:          NA          NA

Main Functions

1. Generating a Covariate Table with COVID-19 Risk Factors

The risk_factor function generates a covariate table with risk factors using UKBB main tab data. Automatically returns sex, age at birthday in 2020, SES, self-reported ethnicity, most recently reported BMI, most recently reported pack-years, whether they reside in aged care (based on hospital admissions data, and covid test data) and blood type. Function also allows user to specify fields of interest (field codes, provided by UK Biobank), and allows the users to specify more intuitive names, for selected fields.

Note: the ukb.tab file must include fields: f.eid, f.31.0.0, f.34.0.0, f.189.0.0, f.21001., f.21000., f.20161.

covar <- risk_factor(ukb.data=covid_example("sim_ukb.tab.gz"),
                     ABO.data=covid_example("sim_covid19_misc.txt.gz"),
                     hesin.file=covid_example("sim_hesin.txt.gz"),
                     res.eng=covid_example("sim_result_england.txt.gz"))

head(covar)
#>   ID sex age     bmi ethnic other.ppl black asian mixed white      SES  smoke
#> 1  1   1  74 39.0947   1001         0     0     0     0     1  5.43719  0.000
#> 2  2   1  58 25.3177   1001         0     0     0     0     1 -2.10787  0.000
#> 3  3   0  51 32.2349   1002         0     0     0     0     1  7.36321 25.625
#> 4  4   0  56 21.7955   1001         0     0     0     0     1 -5.62047  0.000
#> 5  5   0  72 22.6203   1001         0     0     0     0     1 -1.09293  0.000
#> 6  6   1  67 25.9823   1001         0     0     0     0     1 -3.90245  0.000
#>   blood_group O AB B A inAgedCare
#> 1          AO 0  0 0 1          0
#> 2          AO 0  0 0 1          0
#> 3          AO 0  0 0 1          0
#> 4          AO 0  0 0 1          0
#> 5          AO 0  0 0 1          0
#> 6          OO 1  0 0 0          0

2. Summarizing COVID-19 Test Results

The makePhenotypes function summarizes COVID-19 test results, death register data, and hospital inpatient data. It generates phenotypes for COVID-19 susceptibility, severity, and mortality.

COVID-19 susceptibility

susceptibility <- makePhenotypes(ukb.data=covid_example("sim_ukb.tab.gz"),
                        res.eng=covid_example("sim_result_england.txt.gz"),
                        death.file=covid_example("sim_death.txt.gz"),
                        death.cause.file=covid_example("sim_death_cause.txt.gz"),
                        hesin.file=covid_example("sim_hesin.txt.gz"),
                        hesin_diag.file=covid_example("sim_hesin_diag.txt.gz"),
                        hesin_oper.file=covid_example("sim_hesin_oper.txt.gz"),
                        hesin_critical.file=covid_example("sim_hesin_critical.txt.gz"),
                        code.file=covid_example("coding240.txt.gz"),
                        pheno.type = "susceptibility")
#> [1] "965 participants got tested until 2021-04-05."
#> [1] "218 participants got positive test results until 2021-04-05."
#> [1] "There are 21 deaths with COVID-19. 20 of them primary death cause is COVID-19."
#> [1] "50 patients admitted to hospital were diagnosed as COVID-19 until 2021-04-05."
#> [1] "32 patients' primary diagnosis is COVID-19."
#> [1] "1 patients in hospitalization with COVID-19 diagnosis but show negative in the result file. Modified their test results."
#> [1] "There are 219 COVID-19 patients identified. 32 individuals are admitted to hospital. 3 had been in ICU. 1 had been in advanced ICU."
head (susceptibility)
#>   ID pos.neg pos.ppl
#> 1  1       1       1
#> 2  2       0       0
#> 3  3       0       0
#> 4  4       0       0
#> 5  5       0       0
#> 6  6       0       0

COVID-19 severity

severity <- makePhenotypes(ukb.data=covid_example("sim_ukb.tab.gz"),
                        res.eng=covid_example("sim_result_england.txt.gz"),
                        death.file=covid_example("sim_death.txt.gz"),
                        death.cause.file=covid_example("sim_death_cause.txt.gz"),
                        hesin.file=covid_example("sim_hesin.txt.gz"),
                        hesin_diag.file=covid_example("sim_hesin_diag.txt.gz"),
                        hesin_oper.file=covid_example("sim_hesin_oper.txt.gz"),
                        hesin_critical.file=covid_example("sim_hesin_critical.txt.gz"),
                        code.file=covid_example("coding240.txt.gz"),
                        pheno.type = "severity")
#> [1] "726 participants got tested until 2020-12-31."
#> [1] "150 participants got positive test results until 2020-12-31."
#> [1] "There are 10 deaths with COVID-19. 9 of them primary death cause is COVID-19."
#> [1] "50 patients admitted to hospital were diagnosed as COVID-19 until 2020-12-31."
#> [1] "32 patients' primary diagnosis is COVID-19."
#> [1] "1 patients in hospitalization with COVID-19 diagnosis but show negative in the result file. Modified their test results."
#> [1] "There are 151 COVID-19 patients identified. 32 individuals are admitted to hospital. 3 had been in ICU. 1 had been in advanced ICU."

COVID-19 mortality

mortality <- makePhenotypes(ukb.data=covid_example("sim_ukb.tab.gz"),
                        res.eng=covid_example("sim_result_england.txt.gz"),
                        death.file=covid_example("sim_death.txt.gz"),
                        death.cause.file=covid_example("sim_death_cause.txt.gz"),
                        hesin.file=covid_example("sim_hesin.txt.gz"),
                        hesin_diag.file=covid_example("sim_hesin_diag.txt.gz"),
                        hesin_oper.file=covid_example("sim_hesin_oper.txt.gz"),
                        hesin_critical.file=covid_example("sim_hesin_critical.txt.gz"),
                        code.file=covid_example("coding240.txt.gz"),
                        pheno.type = "mortality")
#> [1] "939 participants got tested until 2021-03-19."
#> [1] "217 participants got positive test results until 2021-03-19."
#> [1] "There are 21 deaths with COVID-19. 20 of them primary death cause is COVID-19."
#> [1] "50 patients admitted to hospital were diagnosed as COVID-19 until 2021-03-19."
#> [1] "32 patients' primary diagnosis is COVID-19."
#> [1] "1 patients in hospitalization with COVID-19 diagnosis but show negative in the result file. Modified their test results."
#> [1] "There are 218 COVID-19 patients identified. 32 individuals are admitted to hospital. 3 had been in ICU. 1 had been in advanced ICU."

3. Perfroming Association Tests

The log_cov function performs association tests using logistic regressions. This is an example of association tests between COVID-19 susceptibility and three risk factors: sex, age and BMI.

log_cov(pheno=susceptibility, covariates=covar, phe.name="pos.neg", cov.name=c("sex", "age", "bmi"))
#> Waiting for profiling to be done...
#>                Estimate        OR     2.5 %    97.5 %           p
#> (Intercept) -0.16475743 0.8480994 0.1954585 3.6381032 0.824991899
#> sex1         0.04207813 1.0429760 0.7644672 1.4215535 0.790121307
#> age         -0.03080456 0.9696651 0.9519878 0.9876397 0.001009957
#> bmi          0.03625193 1.0369170 1.0076088 1.0667564 0.012568486

4. Generating a Comorbidity Summary File

The comorbidity.summary function scans all the hospitalisation records with a given time period and generates comorbidity summary table. The following example is to generate a comorbidity summary table that includes all the primary and secondary diagnoses in the hospital inpatient data after 16 March 2020.

comorb <- comorbidity_summary(ukb.data=covid_example("sim_ukb.tab.gz"),
                              hesin.file=covid_example("sim_hesin.txt.gz"), 
                              hesin_diag.file=covid_example("sim_hesin_diag.txt.gz"), 
                              ICD10.file=covid_example("ICD10.coding19.txt.gz"),
                              primary = FALSE,
                              Date.start = "16/03/2020")
comorb[1:6,1:10]
#>     ID A00-A09 A15-A19 A20-A28 A30-A49 A50-A64 A65-A69 A70-A74 A75-A79 A80-A89
#> 1    1       1       0       0       1       0       0       0       0       0
#> 2   10       0       0       0       0       0       0       0       0       0
#> 3  100       0       0       0       0       0       0       0       0       0
#> 4 1000       0       0       0       0       0       0       0       0       0
#> 5  101       0       0       0       0       0       0       0       0       0
#> 6  102       0       0       0       0       0       0       0       0       0

5. Performing Association Tests between COVID-19 Phenotype and Comorbidities

The comorbidity.asso function performs association tests between comorbidity categories and selected phenotype using logistic regression models.This is an example of association tests between COVID-19 susceptibility and all comorbidities. It shows NAs when fitted probabilities numerically 0 or 1 occurred in the logistic regression models.

comorb.asso <- comorbidity_asso(pheno=susceptibility,
                                covariates=covar,
                                cormorbidity=comorb,
                                population="white",
                                cov.name=c("sex","age","bmi","SES","smoke","inAgedCare"),
                                phe.name="pos.neg",
                                ICD10.file=covid_example("ICD10.coding19.txt.gz"))
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning in regularize.values(x, y, ties, missing(ties), na.rm = na.rm):
#> collapsing to unique 'x' values
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
head(comorb.asso, 4)
#>                                               ICD10  Estimate       OR     2.5%
#> A00-A09      A00-A09 Intestinal infectious diseases 0.4722864 1.603657 0.756784
#> A15-A19                        A15-A19 Tuberculosis        NA       NA       NA
#> A20-A28 A20-A28 Certain zoonotic bacterial diseases        NA       NA       NA
#> A30-A49            A30-A49 Other bacterial diseases 1.2246077 3.402831 1.633209
#>            97.5%           p
#> A00-A09 3.240022 0.199664372
#> A15-A19       NA          NA
#> A20-A28       NA          NA
#> A30-A49 6.978689 0.000873076

4. Sample and Variant Quality Control

The sampleQC and variantQC functions perform quality control for sample and variant data, respectively.

# Example usage for sample QC
sampleQC(ukb.data=covid_example("sim_ukb.tab.gz"), 
         withdrawnFile=covid_example("sim_withdrawn.csv.gz"), 
         ancestry="all", 
         software="SAIGE", 
         outDir=covid_example("results"))
#> [1] "Reading in Withdrawn IDs from /tmp/RtmpinvnHJ/Rinst6e353668d3c3/UKB.COVID19/extdata/sim_withdrawn.csv.gz"
#> [1] "Reading in Sample QC info from /tmp/RtmpinvnHJ/Rinst6e353668d3c3/UKB.COVID19/extdata/sim_ukb.tab.gz"
#> [1] "Outputting lists to /tmp/RtmpinvnHJ/Rinst6e353668d3c3/UKB.COVID19/extdata/results"

# Example usage for variant QC
variantQC(snpQcFile=covid_example("sim_ukb_snp_qc.txt.gz"), 
          mfiDir=covid_example("alleleFreqs"), 
          mafFilt=0.001, 
          infoFilt=0.5, 
          outDir=covid_example("results"))
#> [1] "Reading in SNP MAF INFO data from /tmp/RtmpinvnHJ/Rinst6e353668d3c3/UKB.COVID19/extdata/alleleFreqs/ukb_mfi_chr*_v3.txt"
#> [1] "writing list of variants to include under different filters to /tmp/RtmpinvnHJ/Rinst6e353668d3c3/UKB.COVID19/extdata/results"
#> [1] "writing list of variants to use for genetic relatedness matrix (grm) to /tmp/RtmpinvnHJ/Rinst6e353668d3c3/UKB.COVID19/extdata/results"

5. Preparing Files for GWAS

The makeGWASFiles function generates phenotype files suitable for GWAS using tools like PLINK and SAIGE.

# Example usage
makeGWASFiles(ukb.data=covid_example("sim_ukb.tab.gz"), 
              pheno=susceptibility, 
              covariates=covar, 
              phe.name="pos.ppl", 
              cov.name=NULL, 
              includeSampsFile=NULL, 
              software="SAIGE", 
              outDir=covid_example("results"), 
              prefix="pos.ppl")
#> [1] "No covriates specified - all included in covariate dataframe will be included in outfile"
#> [1] "outputting phenotype file: /tmp/RtmpinvnHJ/Rinst6e353668d3c3/UKB.COVID19/extdata/results/pos.ppl.txt"

Example Workflow

Here is an example workflow of analysing the risk factors of COVID-19 susceptibility, that combines the main functions of the UKB.COVID19 package:

# Load the package
library(UKB.COVID19)

# Summarize COVID-19 risk factors
covar <- risk_factor(ukb.data=covid_example("sim_ukb.tab.gz"), 
                         ABO.data=covid_example("sim_covid19_misc.txt.gz"),
                         hesin.file=covid_example("sim_hesin.txt.gz"),
                         res.eng=covid_example("sim_result_england.txt.gz"))

# Summarize COVID-19 test results
susceptibility <- makePhenotypes(ukb.data=covid_example("sim_ukb.tab.gz"),
                        res.eng=covid_example("sim_result_england.txt.gz"),
                        death.file=covid_example("sim_death.txt.gz"),
                        death.cause.file=covid_example("sim_death_cause.txt.gz"),
                        hesin.file=covid_example("sim_hesin.txt.gz"),
                        hesin_diag.file=covid_example("sim_hesin_diag.txt.gz"),
                        hesin_oper.file=covid_example("sim_hesin_oper.txt.gz"),
                        hesin_critical.file=covid_example("sim_hesin_critical.txt.gz"),
                        code.file=covid_example("coding240.txt.gz"),
                        pheno.type = "susceptibility")
#> [1] "965 participants got tested until 2021-04-05."
#> [1] "218 participants got positive test results until 2021-04-05."
#> [1] "There are 21 deaths with COVID-19. 20 of them primary death cause is COVID-19."
#> [1] "50 patients admitted to hospital were diagnosed as COVID-19 until 2021-04-05."
#> [1] "32 patients' primary diagnosis is COVID-19."
#> [1] "1 patients in hospitalization with COVID-19 diagnosis but show negative in the result file. Modified their test results."
#> [1] "There are 219 COVID-19 patients identified. 32 individuals are admitted to hospital. 3 had been in ICU. 1 had been in advanced ICU."

# Perfrom association tests
log_cov(pheno=susceptibility, covariates=covar, phe.name="pos.neg", cov.name=c("sex", "age", "bmi"))
#> Waiting for profiling to be done...
#>                Estimate        OR     2.5 %    97.5 %           p
#> (Intercept) -0.16475743 0.8480994 0.1954585 3.6381032 0.824991899
#> sex1         0.04207813 1.0429760 0.7644672 1.4215535 0.790121307
#> age         -0.03080456 0.9696651 0.9519878 0.9876397 0.001009957
#> bmi          0.03625193 1.0369170 1.0076088 1.0667564 0.012568486

# Generate comorbidity table
comorb <- comorbidity_summary(ukb.data=covid_example("sim_ukb.tab.gz"),
                              hesin.file=covid_example("sim_hesin.txt.gz"), 
                              hesin_diag.file=covid_example("sim_hesin_diag.txt.gz"), 
                              ICD10.file=covid_example("ICD10.coding19.txt.gz"),
                              primary = FALSE,
                              Date.start = "16/03/2020")

# Perform association tests between COVID-19 phenotype and comorbidities
comorb.asso <- comorbidity_asso(pheno=susceptibility,
                                covariates=covar,
                                cormorbidity=comorb,
                                population="white",
                                cov.name=c("sex","age","bmi","SES","smoke","inAgedCare"),
                                phe.name="pos.neg",
                                ICD10.file=covid_example("ICD10.coding19.txt.gz"))
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning in regularize.values(x, y, ties, missing(ties), na.rm = na.rm):
#> collapsing to unique 'x' values
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...

# Perform sample quality control
sampleQC(ukb.data=covid_example("sim_ukb.tab.gz"), 
         withdrawnFile=covid_example("sim_withdrawn.csv.gz"), 
         ancestry="all", 
         software="SAIGE", 
         outDir=covid_example("results"))
#> [1] "Reading in Withdrawn IDs from /tmp/RtmpinvnHJ/Rinst6e353668d3c3/UKB.COVID19/extdata/sim_withdrawn.csv.gz"
#> [1] "Reading in Sample QC info from /tmp/RtmpinvnHJ/Rinst6e353668d3c3/UKB.COVID19/extdata/sim_ukb.tab.gz"
#> [1] "Outputting lists to /tmp/RtmpinvnHJ/Rinst6e353668d3c3/UKB.COVID19/extdata/results"

# Perform variant quality control
variantQC(snpQcFile=covid_example("sim_ukb_snp_qc.txt.gz"), 
          mfiDir=covid_example("alleleFreqs"), 
          mafFilt=0.001, 
          infoFilt=0.5, 
          outDir=covid_example("results"))
#> [1] "Reading in SNP MAF INFO data from /tmp/RtmpinvnHJ/Rinst6e353668d3c3/UKB.COVID19/extdata/alleleFreqs/ukb_mfi_chr*_v3.txt"
#> [1] "writing list of variants to include under different filters to /tmp/RtmpinvnHJ/Rinst6e353668d3c3/UKB.COVID19/extdata/results"
#> [1] "writing list of variants to use for genetic relatedness matrix (grm) to /tmp/RtmpinvnHJ/Rinst6e353668d3c3/UKB.COVID19/extdata/results"

# Preparing files for GWAS
makeGWASFiles(ukb.data=covid_example("sim_ukb.tab.gz"), 
              pheno=susceptibility, 
              covariates=covar, 
              phe.name="pos.ppl", 
              cov.name=NULL, 
              includeSampsFile=NULL, 
              software="SAIGE", 
              outDir=covid_example("results"), 
              prefix="pos.ppl")
#> [1] "No covriates specified - all included in covariate dataframe will be included in outfile"
#> [1] "outputting phenotype file: /tmp/RtmpinvnHJ/Rinst6e353668d3c3/UKB.COVID19/extdata/results/pos.ppl.txt"

Conclusion

The UKB.COVID19 package provides comprehensive tools for analyzing COVID-19 data from the UK Biobank. By following this vignette, users can efficiently summarize data, perform association tests, and prepare files for genetic analysis. For more detailed information, refer to the package documentation and the GitHub repository.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.