The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
Once risks is released on CRAN,
install.packages("risks")
will be available. Currently, the
development version of risks
can be installed from GitHub using:
We use a cohort of women with breast cancer, as used by Spiegelman
and Hertzmark (Am J
Epidemiol 2005) and Greenland (Am J Epidemiol
2004). The the categorical exposure is stage
, the
binary outcome is death
, and the binary confounder is
receptor
.
library(risks) # provides riskratio(), riskdiff(), postestimation functions
library(dplyr) # For data handling
library(broom) # For tidy() model summaries
data(breastcancer) # Load sample data
breastcancer %>% # Display the sample data
group_by(receptor, stage) %>%
summarize(deaths = sum(death), total = n(), risk = deaths/total)
#> # A tibble: 6 × 5
#> # Groups: receptor [2]
#> receptor stage deaths total risk
#> <chr> <fct> <dbl> <int> <dbl>
#> 1 High Stage I 5 55 0.0909
#> 2 High Stage II 17 74 0.230
#> 3 High Stage III 9 15 0.6
#> 4 Low Stage I 2 12 0.167
#> 5 Low Stage II 9 22 0.409
#> 6 Low Stage III 12 14 0.857
The risk of death is higher among women with higher-stage and hormone
receptor-low cancers, which also tend to be of higher stage. Using
risks
models to obtain (possibly multivariable-adjusted)
risk ratios or risk differences is similar to the standard code for
logistic models in R. As customary in R, log(RR) is returned; see below
for how to obtain exponentiated values. No options for model
family
or link
need to be supplied:
fit_rr <- riskratio(formula = death ~ stage + receptor, data = breastcancer)
summary(fit_rr)
#>
#> Risk ratio model, fitted via marginal standardization of a logistic model with delta method (margstd_delta).
#> Call:
#> stats::glm(formula = death ~ stage + receptor, family = binomial(link = "logit"),
#> data = breastcancer, start = "(no starting values)")
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#>
#>
#> Coefficients: (3 not defined because of singularities)
#> Estimate Std. Error z value Pr(>|z|)
#> stageStage I 0.0000 0.0000 NaN NaN
#> stageStage II 0.8989 0.3875 2.320 0.0203 *
#> stageStage III 1.8087 0.3783 4.781 1.75e-06 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 228.15 on 191 degrees of freedom
#> Residual deviance: 185.88 on 188 degrees of freedom
#> AIC: 193.88
#>
#> Number of Fisher Scoring iterations: 4
#>
#> Confidence intervals for coefficients: (delta method)
#> 2.5 % 97.5 %
#> stageStage I 0.0000000 0.000000
#> stageStage II 0.1395299 1.658324
#> stageStage III 1.0671711 2.550242
To obtain risk differences, use riskdiff
, which has the
same syntax:
fit_rd <- riskdiff(formula = death ~ stage + receptor, data = breastcancer)
summary(fit_rd)
#>
#> Risk difference model, fitted via marginal standardization of a logistic model with delta method (margstd_delta).
#> Call:
#> stats::glm(formula = death ~ stage + receptor, family = binomial(link = "logit"),
#> data = breastcancer, start = "(no starting values)")
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#>
#>
#> Coefficients: (3 not defined because of singularities)
#> Estimate Std. Error z value Pr(>|z|)
#> stageStage I 0.00000 0.00000 NaN NaN
#> stageStage II 0.16303 0.05964 2.734 0.00626 **
#> stageStage III 0.57097 0.09962 5.732 9.95e-09 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 228.15 on 191 degrees of freedom
#> Residual deviance: 185.88 on 188 degrees of freedom
#> AIC: 193.88
#>
#> Number of Fisher Scoring iterations: 4
#>
#> Confidence intervals for coefficients: (delta method)
#> 2.5 % 97.5 %
#> stageStage I 0.00000000 0.0000000
#> stageStage II 0.04614515 0.2799187
#> stageStage III 0.37571719 0.7662158
For example, the risk of death was 57 percentage points higher in women with stage III breast cancer compared to stage I (95% confidence interval, 38 to 77 percentage points), adjusting for hormone receptor status.
The model summary in riskratio()
and
riskdiff()
includes to two additions compared to a regular
glm()
model:
summary()
, the type of risk
regression model is displayed (in the example,
“Risk ratio model, fitted as binomial model...
”).The risks package provides an interface to
broom::tidy()
, which returns a data frame of all
coefficients (risk differences in this example), their standard errors,
z statistics, and confidence intervals.
tidy(fit_rd)
#> # A tibble: 3 × 8
#> term estimate std.error statistic p.value conf.low conf.high model
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 stageStage I 0 0 NaN NaN 0 0 marg…
#> 2 stageStage II 0.163 0.0596 2.73 6.26e-3 0.0461 0.280 marg…
#> 3 stageStage III 0.571 0.0996 5.73 9.95e-9 0.376 0.766 marg…
In accordance with glm()
standards, coefficients for
relative risks are shown on the logarithmic scale. Exponentiated
coefficients for risk ratios are easily obtained via
tidy(..., exponentiate = TRUE)
:
tidy(fit_rr, exponentiate = TRUE)
#> # A tibble: 3 × 8
#> term estimate std.error statistic p.value conf.low conf.high model
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 stageStage I 1 0 NaN NaN 1 1 marg…
#> 2 stageStage II 2.46 0.387 2.32 2.03e-2 1.15 5.25 marg…
#> 3 stageStage III 6.10 0.378 4.78 1.75e-6 2.91 12.8 marg…
For example, the risk of death was 6 times higher in women with stage III breast cancer compared to stage I (95% confidence interval, 3 to 13 times), adjusting for hormone receptor status.
Typical R functions that build on regression models can further
process fitted risks
models. In addition to
tidy()
, examples are:
coef(fit)
or coefficients(fit)
return
model coefficients (i.e., RDs or log(RR)) as a numeric
vectorconfint(fit, level = 0.9)
returns 90%
confidence intervals.predict(fit, type = "response")
or
fitted.values(fit)
return predicted probabilities of the
binary outcome.residuals(fit)
returns model residuals.These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.