The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Introduction to sumExtras

library(sumExtras)
library(gtsummary)
library(dplyr)

use_jama_theme()

All examples in this vignette use the JAMA compact theme via use_jama_theme(). See vignette("themes") to set this up.

The extras() Function

If you’ve worked with {gtsummary} before, you’re familiar with the typical workflow of building summary tables: creating a base table with tbl_summary(), then progressively adding features like overall columns, p-values, and formatting tweaks. While {gtsummary}’s modular approach provides flexibility, the same sequence of functions appears repeatedly in analysis scripts.

extras() consolidates the most common {gtsummary} formatting steps into one call: bold labels, a clean header, an overall column, p-values, and missing value cleanup.

Standard {gtsummary}

theme_gtsummary_compact("jama")

trial |>
  tbl_summary(by = trt) |>
  add_overall() |>
  add_p() |>
  bold_labels() |>
  bold_p() |>
  modify_header(label = "")

With {sumExtras}

use_jama_theme()

trial |>
  tbl_summary(by = trt) |>
  extras()

Table produced by extras()

Customizing Output

You can control which features are applied:

# Without p-values
trial |>
  tbl_summary(by = trt) |>
  extras(pval = FALSE)
Overall
N = 200
1
Drug A
N = 98
1
Drug B
N = 102
1
Age 47 (38, 57) 46 (37, 60) 48 (39, 56)
    Unknown 11 7 4
Marker Level (ng/mL) 0.64 (0.22, 1.41) 0.84 (0.23, 1.60) 0.52 (0.18, 1.21)
    Unknown 10 6 4
T Stage


    T1 53 (27%) 28 (29%) 25 (25%)
    T2 54 (27%) 25 (26%) 29 (28%)
    T3 43 (22%) 22 (22%) 21 (21%)
    T4 50 (25%) 23 (23%) 27 (26%)
Grade


    I 68 (34%) 35 (36%) 33 (32%)
    II 68 (34%) 32 (33%) 36 (35%)
    III 64 (32%) 31 (32%) 33 (32%)
Tumor Response 61 (32%) 28 (29%) 33 (34%)
    Unknown 7 3 4
Patient Died 112 (56%) 52 (53%) 60 (59%)
Months to Death/Censor 22.4 (15.9, 24.0) 23.5 (17.4, 24.0) 21.2 (14.5, 24.0)
1 Median (Q1, Q3); n (%)
# Overall column last instead of first
trial |>
  tbl_summary(by = trt) |>
  extras(last = TRUE)
Drug A
N = 98
1
Drug B
N = 102
1
Overall
N = 200
1
p-value2
Age 46 (37, 60) 48 (39, 56) 47 (38, 57) 0.718
    Unknown 7 4 11
Marker Level (ng/mL) 0.84 (0.23, 1.60) 0.52 (0.18, 1.21) 0.64 (0.22, 1.41) 0.085
    Unknown 6 4 10
T Stage


0.866
    T1 28 (29%) 25 (25%) 53 (27%)
    T2 25 (26%) 29 (28%) 54 (27%)
    T3 22 (22%) 21 (21%) 43 (22%)
    T4 23 (23%) 27 (26%) 50 (25%)
Grade


0.871
    I 35 (36%) 33 (32%) 68 (34%)
    II 32 (33%) 36 (35%) 68 (34%)
    III 31 (32%) 33 (32%) 64 (32%)
Tumor Response 28 (29%) 33 (34%) 61 (32%) 0.530
    Unknown 3 4 7
Patient Died 52 (53%) 60 (59%) 112 (56%) 0.412
Months to Death/Censor 23.5 (17.4, 24.0) 21.2 (14.5, 24.0) 22.4 (15.9, 24.0) 0.145
1 Median (Q1, Q3); n (%)
2 Wilcoxon rank sum test; Pearson’s Chi-squared test
# Custom header text
trial |>
  tbl_summary(by = trt) |>
  extras(header = "Variable")
Variable Overall
N = 200
1
Drug A
N = 98
1
Drug B
N = 102
1
p-value2
Age 47 (38, 57) 46 (37, 60) 48 (39, 56) 0.718
    Unknown 11 7 4
Marker Level (ng/mL) 0.64 (0.22, 1.41) 0.84 (0.23, 1.60) 0.52 (0.18, 1.21) 0.085
    Unknown 10 6 4
T Stage


0.866
    T1 53 (27%) 28 (29%) 25 (25%)
    T2 54 (27%) 25 (26%) 29 (28%)
    T3 43 (22%) 22 (22%) 21 (21%)
    T4 50 (25%) 23 (23%) 27 (26%)
Grade


0.871
    I 68 (34%) 35 (36%) 33 (32%)
    II 68 (34%) 32 (33%) 36 (35%)
    III 64 (32%) 31 (32%) 33 (32%)
Tumor Response 61 (32%) 28 (29%) 33 (34%) 0.530
    Unknown 7 3 4
Patient Died 112 (56%) 52 (53%) 60 (59%) 0.412
Months to Death/Censor 22.4 (15.9, 24.0) 23.5 (17.4, 24.0) 21.2 (14.5, 24.0) 0.145
1 Median (Q1, Q3); n (%)
2 Wilcoxon rank sum test; Pearson’s Chi-squared test

Or pass arguments as a list for reuse across tables:

my_args <- list(pval = TRUE, overall = TRUE, last = TRUE)

trial |>
  select(age, grade, stage, trt) |>
  tbl_summary(by = trt) |>
  extras(.args = my_args)
Drug A
N = 98
1
Drug B
N = 102
1
Overall
N = 200
1
p-value2
Age 46 (37, 60) 48 (39, 56) 47 (38, 57) 0.718
    Unknown 7 4 11
Grade


0.871
    I 35 (36%) 33 (32%) 68 (34%)
    II 32 (33%) 36 (35%) 68 (34%)
    III 31 (32%) 33 (32%) 64 (32%)
T Stage


0.866
    T1 28 (29%) 25 (25%) 53 (27%)
    T2 25 (26%) 29 (28%) 54 (27%)
    T3 22 (22%) 21 (21%) 43 (22%)
    T4 23 (23%) 27 (26%) 50 (25%)
1 Median (Q1, Q3); n (%)
2 Wilcoxon rank sum test; Pearson’s Chi-squared test

On non-stratified tables, extras() skips add_overall() and add_p() and applies only the formatting that makes sense. It works the same way with tbl_regression() — bold labels, bold significant p-values (from the model), clean header, and missing value cleanup are applied automatically while irrelevant options are silently ignored. It never breaks your pipeline.

# Regression tables work too
glm(response ~ age + grade, data = trial, family = binomial) |>
  tbl_regression(exponentiate = TRUE) |>
  extras()
OR 95% CI p-value
Age 1.02 1.00, 1.04 0.10
Grade


    I
    II 0.85 0.39, 1.85 0.7
    III 1.01 0.47, 2.16 >0.9
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

For merged tables, call extras() on each sub-table before merging. All formatting (bold labels, p-values, missing symbols) carries through tbl_merge(), so there’s no need to call extras() again after:

t1 <- trial |>
  tbl_summary(by = trt, include = c(age, grade)) |>
  extras()

t2 <- trial |>
  tbl_summary(by = trt, include = c(marker, stage)) |>
  extras()

tbl_merge(list(t1, t2), tab_spanner = c("**Set A**", "**Set B**"))

Cleaning Missing Values

clean_table() standardizes missing or zero-count representations ("0 (NA%)", "NA (NA)", "NA, NA", etc.) to "---". It runs automatically inside extras(), but you can also use it on its own. The symbol parameter controls the replacement text (default "---"). You can also pass symbol through extras().

demo_trial <- trial |>
  mutate(
    age = if_else(trt == "Drug B", 0, age),
    marker = if_else(trt == "Drug A", NA, marker)
  ) |>
  select(trt, age, marker)

Without cleaning

demo_trial |>
  tbl_summary(by = trt)

With clean_table()

demo_trial |>
  tbl_summary(by = trt) |>
  clean_table()
Characteristic Drug A
N = 98
1
Drug B
N = 102
1
age 46 (37, 60) 0 (0, 0)
    Unknown 7 0
marker NA (NA, NA) 0.52 (0.18, 1.21)
    Unknown 98 4
1 Median (Q1, Q3)
Characteristic Drug A
N = 98
1
Drug B
N = 102
1
age 46 (37, 60)
    Unknown 7 0
marker 0.52 (0.18, 1.21)
    Unknown 98 4
1 Median (Q1, Q3)

Automatic Labeling

add_auto_labels() applies human-readable variable labels from a dictionary. Manual labels set in tbl_summary() always take priority.

dictionary <- tibble::tribble(
  ~variable,    ~description,
  "trt",        "Chemotherapy Treatment",
  "age",        "Age at Enrollment (years)",
  "marker",     "Marker Level (ng/mL)",
  "stage",      "T Stage",
  "grade",      "Tumor Grade"
)

trial |>
  tbl_summary(by = trt, include = c(age, grade, marker)) |>
  add_auto_labels(dictionary = dictionary) |>
  extras()
Overall
N = 200
1
Drug A
N = 98
1
Drug B
N = 102
1
p-value2
Age 47 (38, 57) 46 (37, 60) 48 (39, 56) 0.718
    Unknown 11 7 4
Grade


0.871
    I 68 (34%) 35 (36%) 33 (32%)
    II 68 (34%) 32 (33%) 36 (35%)
    III 64 (32%) 31 (32%) 33 (32%)
Marker Level (ng/mL) 0.64 (0.22, 1.41) 0.84 (0.23, 1.60) 0.52 (0.18, 1.21) 0.085
    Unknown 10 6 4
1 Median (Q1, Q3); n (%)
2 Wilcoxon rank sum test; Pearson’s Chi-squared test

For more on label priority, pre-labeled data, and auto-discovery, see vignette("labeling").

Pipeline Order

When combining with group headers and styling, order matters:

tbl_summary(by = ...) |>
  extras() |> # always first
  add_variable_group_header() |> # after extras()
  add_group_styling() |> # format group headers
  add_group_colors() # must be last (converts to gt)

add_variable_group_header() must come after extras(), and add_group_colors() must be last since it converts the table to gt.

Other Vignettes

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.