The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Multibias example: Evans

library(multibias)

We are interested in quantifying the effect of smoking (SMK) on coronary heart disease (CHD) using data from the Evans County Heart Study. Let’s inspect the dataframe we have for 609 participants aged 40 and older.

head(evans)
#>   ID CHD AGE CHL SMK ECG DBP SBP HPT
#> 1 21   0  56 270   0   0  80 138   0
#> 2 31   0  43 159   1   0  74 128   0
#> 3 51   1  56 201   1   1 112 164   1
#> 4 71   0  64 179   1   0 100 200   1
#> 5 74   0  49 243   1   0  82 145   0
#> 6 91   0  46 252   1   0  88 142   0

This is clearly not the most robust data, but, for purposes of demonstrating multibias, let’s proceed by pretending that our data was missing information on AGE. Let’s inspect the resulting effect estimate, adjusted for hypertension (HPT).

biased_model <- glm(CHD ~ SMK + HPT,
  family = binomial(link = "logit"),
  data = evans
)
or <- round(exp(coef(biased_model)[2]), 2)
or_ci_low <- round(
  exp(coef(biased_model)[2] - 1.96 * summary(biased_model)$coef[2, 2]), 2
)
or_ci_high <- round(
  exp(coef(biased_model)[2] + 1.96 * summary(biased_model)$coef[2, 2]), 2
)

print(paste0("Biased Odds Ratio: ", or))
#> [1] "Biased Odds Ratio: 1.99"
print(paste0("95% CI: (", or_ci_low, ", ", or_ci_high, ")"))
#> [1] "95% CI: (1.12, 3.55)"

This estimate is around what we’d expect. In the IJE article The association between tobacco smoking and coronary heart disease the author notes: “Cigarette smokers have about twice as much coronary heart disease as non-smokers, whether measured by deaths, prevalence, or the incidence of new events.” However, we also know from studies like this BMJ meta-analysis that the effect estimate can vary greatly depending on the degree of cigarette consumption.

Can we anticipate whether this odds ratio without age-adjustment is biased towards or away from the null? Let’s consider the association of the uncontrolled confounder with the exposure and outcome.

cor(evans$SMK, evans$AGE)
#> [1] -0.1391298
cor(evans$CHD, evans$AGE)
#> [1] 0.1393077

In our data, AGE has a negative association with SMK (older people are less likely to be smokers) and a positive association with CHD (older people are more likely to have CHD). These opposite associations must be biasing the odds ratio towards the null, creating a distortion where those who are less likely to smoke are more likely to experience the outcome.

We’ll treat AGE as a binary indicator of over (1) or under (0) age 60. To adjust for the uncontrolled confounding from AGE, let’s refer to the appropriate bias model for a binary uncontrolled confounder: logit(P(U=1)) = α0 + α1X + α2Y + α2+jCj.

To derive the necessary bias parameters, let’s make the following assumption:

To convert these relationships as parameters in the model, we’ll log-transform them from odds ratios. For the model intercept, we can use the following reasoning: what is the probability that a non-smoker (X=0) without CHD (Y=0) and HPT (C=0) is over age 60 in this population? We’ll assume this is a 25% probability. We’ll use the inverse logit function qlogis() from the stats package to convert this from a probability to the intercept coefficient of the logistic regression model.

u_0 <- qlogis(0.25)
u_x <- log(0.5)
u_y <- log(2.5)
u_c <- log(2)

Now let’s plug these bias parameters into adjust_uc() along with our data_observed object to obtain our bias-adjusted odds ratio.

df_obs <- data_observed(
  data = evans,
  exposure = "SMK",
  outcome = "CHD",
  confounders = "HPT"
)

set.seed(1234)
adjust_uc(
  df_obs,
  u_model_coefs = c(u_0, u_x, u_y, u_c)
)
#> $estimate
#> [1] 2.279489
#> 
#> $ci
#> [1] 1.261951 4.117491

We get an odds ratio of 2.3 (95% CI: 1.3, 4.1). This matches our expectation that the bias adjustment would pull the odds ratio away from the null. How does this result compare to the result we would get if age wasn’t missing in the data and was incorporated in the outcome regression?

full_model <- glm(CHD ~ SMK + HPT + AGE,
  family = binomial(link = "logit"),
  data = evans
)
or <- round(exp(coef(full_model)[2]), 2)
or_ci_low <- round(
  exp(coef(biased_model)[2] - 1.96 * summary(full_model)$coef[2, 2]), 2
)
or_ci_high <- round(
  exp(coef(biased_model)[2] + 1.96 * summary(full_model)$coef[2, 2]), 2
)

print(paste0("Odds Ratio: ", or))
#> [1] "Odds Ratio: 2.31"
print(paste0("95% CI: (", or_ci_low, ", ", or_ci_high, ")"))
#> [1] "95% CI: (1.1, 3.6)"

It turns out that our bias-adjusted odds ratio using multibias is close to this complete-data odds ratio of 2.3.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.