Customizable Table Building with Tangram Pipe

Overview

The goal of this package is to iteratively build a customizable data table, one row at a time. This package will allow a user to input a data object, specify the rows and columns to use for the summary table, and select the type of data to use for each individual row. Missing data, overall statistics, and comparison tests can be calculated using this package as well.

Installation

install.packages("tangram.pipe")

Getting Started

Loading supplementary packages

suppressPackageStartupMessages(require(tangram.pipe))
suppressPackageStartupMessages(require(knitr))
suppressPackageStartupMessages(require(kableExtra))

Initializing the table

The first step to using this package is to initialize the data table to create. Here, the user will select the name of the dataset to be analyzed in the table and specify the variable name to use for the columns. In addition, the user will need to determine whether to account for missing data, calculate overall statistics across all columns, or conduct comparison tests across the columns for each row. The arguments for missing, overall, and comparison will be used as the defaults for each subsequent row added to the table; however, a user can specify a different entry for each argument for individual rows if desired.

This vignette will use the built-in iris dataset, which is a well-known dataset containing flower measurements for three species of iris flowers. Since most of the data in iris is numerical, we will add in two made-up variables (flower color and stem size) in order to demonstrate table-building functions for non-numeric data. Note that the additional columns are made-up purely for demonstration of this package.

iris$color <- sample(c("Blue", "Purple"), size=150, replace=TRUE)
iris$Stem.Size <- sample(c("Small", "Medium", "Medium", "Large"), size=150, replace=TRUE)
iris[149,5] <- NA
iris[150,c(1:4, 6:7)] <- NA
head(iris) %>% 
  kable(escape=F, align="cl") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered"))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species color Stem.Size
5.1 3.5 1.4 0.2 setosa Blue Medium
4.9 3.0 1.4 0.2 setosa Blue Medium
4.7 3.2 1.3 0.2 setosa Purple Medium
4.6 3.1 1.5 0.2 setosa Purple Small
5.0 3.6 1.4 0.2 setosa Blue Large
5.4 3.9 1.7 0.4 setosa Purple Large

For this example, the variable ‘Species’ will be chosen as the column variable; missing and comparison will be set to FALSE to generate a simple example.

tbl1 <- tbl_start(data=iris, col_var="Species", missing=FALSE, overall=TRUE, comparison=FALSE)

Using this function creates a list object that stores the user preferences for building the table going forward; in addition to the five elements listed here, the number of rows is also saved to the list. Subsequent entries to the list will store information for the rows, which will ultimately be compiled to create the final table after all row information has been added.

Adding Rows

Numeric Rows

To start off, we will first add a numeric row to the table. The function num_row reads in data that is numeric in form, and by default calculates the five-number summary statistics (minimum, first quartile, median, third quartile, maximum), as well as the mean and standard deviation for the numeric variable within each column. Since we specified overall=TRUE in the initialization step, an overall summary row will be included as well. The default summary function is num_default, but the user may write their own function to calculate different summary statistics from what is shown here. Currently, num_default is the only built-in summary function for this data type.

Let’s start by calculating summary statistics for the Sepal Length in the iris dataset. Since it makes more sense to display the variable name as “Sepal Length” rather than the R-generated “Sepal.Length”, we will use the rowlabels argument to make this change for the table. Note that if you have a dataframe with labelled variables as columns, leaving rowlabels blank will automatically input the variable’s label as the rowlabel. To output the final object, we use the functions tbl_out and print to display the table.

tbl1 <- tbl_start(data=iris, col_var="Species", missing=FALSE, overall=TRUE, comparison=FALSE) %>% 
  num_row(row_var="Sepal.Length", rowlabels="Sepal Length") %>%
  tbl_out() %>%
  print()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered"))
Variable Measure setosa versicolor virginica Overall
Sepal Length min 4.30 4.90 4.90 4.30
Q1 4.80 5.60 6.30 5.10
median 5.00 5.90 6.50 5.80
Q3 5.20 6.30 6.95 6.40
max 5.80 7.00 7.90 7.90
mean 5.01 5.94 6.61 5.84
SD 0.35 0.52 0.64 0.83

By default, each row function will use two decimal places in reported statistics. We can use the digits argument to specify more or fewer significant digits in the reported table.

tbl1 <- tbl_start(data=iris, col_var="Species", missing=FALSE, overall=TRUE, comparison=FALSE) %>% 
  num_row(row_var="Sepal.Length", rowlabels="Sepal Length", digits=4) %>%
  tbl_out() %>%
  print()
tbl1 %>% 
  tangram_styling() %>%
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered"))
Variable Measure setosa versicolor virginica Overall
Sepal Length min 4.3000 4.9000 4.9000 4.3000
Q1 4.8000 5.6000 6.3000 5.1000
median 5.0000 5.9000 6.5000 5.8000
Q3 5.2000 6.3000 6.9500 6.4000
max 5.8000 7.0000 7.9000 7.9000
mean 5.0060 5.9360 6.6104 5.8405
SD 0.3525 0.5162 0.6386 0.8331

There is a small amount of missing data within the iris dataset. Currently, num_row filters out the missing data and only considers data with complete cases of the row and column variables. To see how much missing data there is in the sepal length, we specify missing=TRUE.

tbl1 <- tbl_start(data=iris, col_var="Species", missing=FALSE, overall=TRUE, comparison=FALSE) %>% 
  num_row(row_var="Sepal.Length", rowlabels="Sepal Length", missing=TRUE) %>%
  tbl_out() %>%
  print()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered"))
Variable Measure setosa versicolor virginica Overall
Sepal Length min 4.30 4.90 4.90 4.30
Q1 4.80 5.60 6.30 5.10
median 5.00 5.90 6.50 5.80
Q3 5.20 6.30 6.95 6.40
max 5.80 7.00 7.90 7.90
mean 5.01 5.94 6.61 5.84
SD 0.35 0.52 0.64 0.83
Missing 0 0 1 1

The function above tells us that the dataset is missing a sepal length measurement for one of the virginica flowers. Note that the function cannot locate instances of missingness in the column variable.

Finally, suppose we want to look at the differences in means across all species. The function num_diff for the comparison argument will calculated the mean difference in sepal length for each row compared to a reference category, which is coded as the first column variable in the table. Here, versicolor and virginica will be compared to setosa. The function also provides a 95% Confidence interval to accompany the mean difference. Currently, num_diff is the only built-in comparison function for num_row.

tbl1 <- tbl_start(data=iris, col_var="Species", missing=FALSE, overall=TRUE, comparison=FALSE) %>% 
  num_row(row_var="Sepal.Length", rowlabels="Sepal Length", comparison=num_diff) %>%
  tbl_out() %>%
  print()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered")) %>%
  column_spec(c(7:9), width_min = "1.5in")
Variable Measure setosa versicolor virginica Overall Test setosa vs. versicolor setosa vs. virginica Compare: All Groups
Sepal Length min 4.30 4.90 4.90 4.30 Difference in Means -0.93 (-1.11, -0.75) -1.60 (-1.81, -1.40) p ≤ 0.001
Q1 4.80 5.60 6.30 5.10
median 5.00 5.90 6.50 5.80
Q3 5.20 6.30 6.95 6.40
max 5.80 7.00 7.90 7.90
mean 5.01 5.94 6.61 5.84
SD 0.35 0.52 0.64 0.83

Categorical Rows

Now, we will look at adding categorical variables. The function cat_row reads in data that is categorical in form, and by default calculates the number of instances for each row category within each column category, as well as the column-wise proportions. The default summary function is cat_default, but the user may write their own function to calculate different summary statistics from what is shown here. Currently, cat_default is the only built-in summary function for this data type.

We will demonstrate this function by looking at Stem.Size in the iris dataset. Note that cat_row and num_row have nearly identical arguments, but cat_row allows you to choose the number of spaces to indent category names using the indent argument. The default setting is 5 spaces.

tbl1 <- tbl_start(data=iris, col_var="Species", missing=FALSE, overall=TRUE, comparison=FALSE) %>% 
  cat_row("Stem.Size", rowlabels="Stem Size") %>%
  tbl_out() %>%
  print()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered"))
Variable Measure setosa versicolor virginica Overall
Stem Size Col. Prop. (N)
     Large 0.26 (13) 0.22 (11) 0.21 (10) 0.23 (34)
     Medium 0.46 (23) 0.52 (26) 0.58 (28) 0.52 (77)
     Small 0.28 (14) 0.26 (13) 0.21 (10) 0.25 (37)

Setting missing=TRUE will reveal the proportion of each species that does not have a corresponding entry for stem size. When missing data is accounted for, the missingness will be recorded as the percentage of each column that is designated as missing data.

tbl1 <- tbl_start(data=iris, col_var="Species", missing=FALSE, overall=TRUE, comparison=FALSE) %>% 
  cat_row("Stem.Size", rowlabels="Stem Size", missing=TRUE) %>%
  tbl_out() %>%
  print()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered"))
Variable Measure setosa versicolor virginica Overall
Stem Size Col. Prop. (N)
     Large 0.26 (13) 0.22 (11) 0.20 (10) 0.23 (34)
     Medium 0.46 (23) 0.52 (26) 0.57 (28) 0.52 (77)
     Small 0.28 (14) 0.26 (13) 0.20 (10) 0.25 (37)
     Missing 0.00 (0) 0.00 (0) 0.02 (1) 0.01 (1)

Finally, let’s look at a comparison test for a categorical row. The default comparison function is cat_comp_default, which will calculate the relative entropy between each column and the reference category, as well as conduct a Chi-Square Goodness of Fit test on the data present. Currently, cat_comp_default is the only built-in function for categorical data, but a user may write their own function to use instead.

tbl1 <- tbl_start(data=iris, col_var="Species", missing=FALSE, overall=TRUE, comparison=FALSE) %>% 
  cat_row("Stem.Size", rowlabels="Stem Size", comparison=cat_comp_default) %>%
  tbl_out() %>%
  print()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered")) %>%
  column_spec(c(2:6), width_min = "1.1in") %>%
  column_spec(7, width_min = "1.5in")
Variable Measure setosa versicolor virginica Overall Test setosa vs. versicolor setosa vs. virginica Compare: All Groups
Stem Size Col. Prop. (N) Relative Entropy 0.01 0.03 p = 0.82
     Large 0.26 (13) 0.22 (11) 0.21 (10) 0.23 (34)
     Medium 0.46 (23) 0.52 (26) 0.58 (28) 0.52 (77)
     Small 0.28 (14) 0.26 (13) 0.21 (10) 0.25 (37)

Binary Data

The final type of data we will examine here is binary data; this is when a variable can only take on two possible values. In a table, it can be helpful to only include one of the options if the second entry can be deduced from looking at the first. This is done using the binary_row function. Summary statistics are calculated using binary_default, and are the same as those calculated using a categorical variable. Note that a user may use cat_row to process binary data if they wish to see both row entries included in the table.

We will now demonstrate the use of binary_row on the color variable in iris. In the dataset, the available colors are blue and purple, so we do not wish to include both entries here.

tbl1 <- tbl_start(data=iris, col_var="Species", missing=FALSE, overall=TRUE, comparison=FALSE) %>% 
  binary_row("color", rowlabels="Color") %>%
  tbl_out() %>%
  print()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered"))
Variable Measure setosa versicolor virginica Overall
Color Col. Prop. (N)
     Blue 0.52 (26) 0.48 (24) 0.52 (25) 0.51 (75)

The binary_row function includes all of the same arguments as the previous row functions, but additionally includes a new argument, reference. This allows a user to choose which group will appear on the table. By default, the alphabetically first row group will appear on the table, which is why ‘Blue’ appeared above. If we want to see the statistics for purple flowers, we can run the following code.

tbl1 <- tbl_start(data=iris, col_var="Species", missing=FALSE, overall=TRUE, comparison=FALSE) %>% 
  binary_row("color", rowlabels="Color", reference="Purple") %>%
  tbl_out() %>%
  print()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered"))
Variable Measure setosa versicolor virginica Overall
Color Col. Prop. (N)
     Purple 0.48 (24) 0.52 (26) 0.48 (23) 0.50 (73)

Finally, let’s look at some comparison functions used for binary data. By default, this row function will calculate the difference in proportions by using binary_diff if comparison=TRUE during initialization. This will calculate differences in proportions across columns; the calculations will also include 95% Confidence intervals.

tbl1 <- tbl_start(data=iris, col_var="Species", missing=FALSE, overall=TRUE, comparison=FALSE) %>% 
  binary_row("color", rowlabels="Color", comparison=binary_diff) %>%
  tbl_out() %>%
  print()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered")) %>%
  column_spec(c(2:6), width_min = "1.1in") %>%
  column_spec(7, width_min = "1.75in") %>%
  column_spec(c(8:9), width_min = "1.5in")
Variable Measure setosa versicolor virginica Overall Test setosa vs. versicolor setosa vs. virginica Compare: All Groups
Color Col. Prop. (N) Difference in Proportions 0.04 (-0.18, 0.26) 0.01 (-0.20, 0.22) p = 0.90
     Blue 0.52 (26) 0.48 (24) 0.52 (25) 0.51 (75)

The package has two additional options for comparison tests using binary data. Odds ratios can be calculated using binary_or, and risk ratios can be calculated with binary_rr. Note that if comparison=TRUE is initialized in tbl_start and a user wants to use an odds ratio or risk ratio here, comparison must be set to either of those two options in this row addition, since excluding the argument will lead to binary_diff being called by default.

tbl1 <- tbl_start(data=iris, col_var="Species", missing=FALSE, overall=TRUE, comparison=FALSE) %>% 
  binary_row("color", rowlabels="Color", comparison=binary_or) %>%
  tbl_out() %>%
  print()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered")) %>%
  column_spec(c(2:7), width_min = "1.1in") %>%
  column_spec(c(8:9), width_min = "1.25in")
Variable Measure setosa versicolor virginica Overall Test setosa vs. versicolor setosa vs. virginica Compare: All Groups
Color Col. Prop. (N) Odds Ratio 1.17 (0.54, 2.57) 1.00 (0.45, 2.20) p = 0.90
     Blue 0.52 (26) 0.48 (24) 0.52 (25) 0.51 (75)
tbl1 <- tbl_start(data=iris, col_var="Species", missing=FALSE, overall=TRUE, comparison=FALSE) %>% 
  binary_row("color", rowlabels="Color", comparison=binary_rr) %>%
  tbl_out() %>%
  print()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered")) %>%
  column_spec(c(2:7), width_min = "1.1in") %>%
  column_spec(c(8:9), width_min = "1.25in")
Variable Measure setosa versicolor virginica Overall Test setosa vs. versicolor setosa vs. virginica Compare: All Groups
Color Col. Prop. (N) Risk Ratio 1.08 (0.55, 2.14) 1.00 (0.50, 1.97) p = 0.90
     Blue 0.52 (26) 0.48 (24) 0.52 (25) 0.51 (75)

Adding an empty row

The empty_row function will add a blank row to the final table. This is useful if a user wants to include blank space between some of table’s rows. The user only needs to specify the name of the list object in order to create the blank row. An optional argument is a header to include, should the user want to create a label for the subsequent rows that follow in the table.

tbl1 <- tbl1 %>% empty_row()

Creating a Finished Product

The following code will generate a finalized table for the iris dataset. It will include all four numeric variables (sepal length, sepal width, petal length, petal width), as well as stem size and color. The final table itself is generated using tbl_out; a print command will display the final result. Below is an example of a customized table report that can be produced using tangram.pipe. Annotations for the unique elements of the rows are created by inserting the comments into the header argument for the empty_row() command.

tbl1 <- tbl_start(iris, "Species", missing=FALSE, overall=TRUE, comparison=TRUE) %>%
  num_row("Sepal.Length", rowlabels="Sepal Length") %>%
  empty_row('<i>No rowlabel, 3 decimal places</i>') %>%
  num_row("Sepal.Width", digits=3) %>%
  empty_row("<i>No comparison test used</i>") %>%
  num_row("Petal.Length", rowlabels="Petal Length", comparison=FALSE) %>%
  empty_row("<i>Missing data considered</i>") %>%
  num_row("Petal.Width", rowlabels="Petal Width", missing=TRUE) %>%
  cat_row("Stem.Size", rowlabels="Stem Size", missing=TRUE) %>%
  empty_row("<i>No rowlabels, indent 3 spaces, odds ratio as test</i>") %>%
  binary_row("color", comparison=binary_or, indent=3) %>%
  tbl_out() %>% 
  print()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered")) %>%
  column_spec(c(2:6), width_min = ".8in") %>%
  column_spec(c(7:9), width_min = "1.5in")
Variable Measure setosa versicolor virginica Overall Test setosa vs. versicolor setosa vs. virginica Compare: All Groups
Sepal Length min 4.30 4.90 4.90 4.30 Difference in Means -0.93 (-1.11, -0.75) -1.60 (-1.81, -1.40) p ≤ 0.001
Q1 4.80 5.60 6.30 5.10
median 5.00 5.90 6.50 5.80
Q3 5.20 6.30 6.95 6.40
max 5.80 7.00 7.90 7.90
mean 5.01 5.94 6.61 5.84
SD 0.35 0.52 0.64 0.83
No rowlabel, 3 decimal places
Sepal.Width min 2.300 2.000 2.200 2.000 Difference in Means 0.658 (0.520, 0.796) 0.463 (0.322, 0.605) p ≤ 0.001
Q1 3.200 2.525 2.800 2.800
median 3.400 2.800 3.000 3.000
Q3 3.675 3.000 3.125 3.300
max 4.400 3.400 3.800 4.400
mean 3.428 2.770 2.965 3.055
SD 0.379 0.314 0.323 0.438
No comparison test used
Petal Length min 1.00 3.00 4.50 1.00
Q1 1.40 4.00 5.10 1.58
median 1.50 4.35 5.60 4.30
Q3 1.58 4.60 5.90 5.10
max 1.90 5.10 6.90 6.90
mean 1.46 4.26 5.56 3.74
SD 0.17 0.47 0.56 1.77
Missing data considered
Petal Width min 0.10 1.00 1.40 0.10 Difference in Means -1.08 (-1.14, -1.02) -1.78 (-1.86, -1.69) p ≤ 0.001
Q1 0.20 1.20 1.80 0.30
median 0.20 1.30 2.00 1.30
Q3 0.30 1.50 2.30 1.80
max 0.60 1.80 2.50 2.50
mean 0.25 1.33 2.02 1.19
SD 0.11 0.20 0.28 0.76
Missing 0 0 1 1
Stem Size Col. Prop. (N) Relative Entropy 0.01 0.03 p = 0.82
     Large 0.26 (13) 0.22 (11) 0.20 (10) 0.23 (34)
     Medium 0.46 (23) 0.52 (26) 0.57 (28) 0.52 (77)
     Small 0.28 (14) 0.26 (13) 0.20 (10) 0.25 (37)
     Missing 0.00 (0) 0.00 (0) 0.02 (1) 0.01 (1)
No rowlabels, indent 3 spaces, odds ratio as test
color Col. Prop. (N) Odds Ratio 1.17 (0.54, 2.57) 1.00 (0.45, 2.20) p = 0.90
   Blue 0.52 (26) 0.48 (24) 0.52 (25) 0.51 (75)

Additional Features

Single Column of Data

The package can handle cases where a user only wants a single summary column of data. In the iris dataset, if we set the column variable to be NULL in tbl_start, we can obtain just one summary column for the dataset without breaking the table up by columns. Note that comparison functions will not run here, even if the comparison argument is set to TRUE.

tbl1 <- tbl_start(iris, NULL, missing=FALSE, overall=TRUE, comparison=FALSE) %>%
  num_row("Sepal.Length", rowlabels="Sepal Length") %>%
  tbl_out() %>%
  print()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered"))
Variable Measure Overall
Sepal Length min 4.30
Q1 5.10
median 5.80
Q3 6.40
max 7.90
mean 5.84
SD 0.83

Changing datasets within a table

This package allows for an individual row to use a different dataset from the one initialized in tbl_start. Use the newdata argument to specify the new dataset to use, then define the rows and columns for the new data. Note that if a new row is added after the row with the differing dataset, the new row will automatically return to using the initialized dataset from tbl_start unless the user specifies otherwise in newdata.

For this example, we will split the iris dataset so that the sepal and petal variables are in separate datasets, and show that the newdata argument can allow the information from both datasets to be combined in one table.

sepaldat <- iris %>% select(-c(Petal.Length, Petal.Width))
petaldat <- iris %>% select(-c(Sepal.Length, Sepal.Width))
tbl1 <- tbl_start(sepaldat, "Species", missing=FALSE, overall=TRUE, comparison=FALSE) %>%
  num_row("Sepal.Length", rowlabels="Sepal Length") %>%
  num_row("Sepal.Width", rowlabels="Sepal Width") %>%
  empty_row(header="Switch to Petal Dataset") %>% 
  num_row(row_var="Petal.Length", col_var="Species", newdata=petaldat) %>%
  num_row(row_var="Petal.Width", col_var="Species", newdata=petaldat) %>%
  tbl_out() %>%
  print()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered"))
Variable Measure setosa versicolor virginica Overall
Sepal Length min 4.30 4.90 4.90 4.30
Q1 4.80 5.60 6.30 5.10
median 5.00 5.90 6.50 5.80
Q3 5.20 6.30 6.95 6.40
max 5.80 7.00 7.90 7.90
mean 5.01 5.94 6.61 5.84
SD 0.35 0.52 0.64 0.83
Sepal Width min 2.30 2.00 2.20 2.00
Q1 3.20 2.52 2.80 2.80
median 3.40 2.80 3.00 3.00
Q3 3.68 3.00 3.12 3.30
max 4.40 3.40 3.80 4.40
mean 3.43 2.77 2.96 3.06
SD 0.38 0.31 0.32 0.44
Switch to Petal Dataset
Petal Length min 1.00 3.00 4.50 1.00
Q1 1.40 4.00 5.10 1.58
median 1.50 4.35 5.60 4.30
Q3 1.58 4.60 5.90 5.10
max 1.90 5.10 6.90 6.90
mean 1.46 4.26 5.56 3.74
SD 0.17 0.47 0.56 1.77
Petal Width min 0.10 1.00 1.40 0.10
Q1 0.20 1.20 1.80 0.30
median 0.20 1.30 2.00 1.30
Q3 0.30 1.50 2.30 1.80
max 0.60 1.80 2.50 2.50
mean 0.25 1.33 2.02 1.19
SD 0.11 0.20 0.28 0.76

Notice that in this example, the column variable for sepaldat was the same as that for petaldat. If the columns used had differed between the datasets, all columns would be included in the table, but only columns corresponding to the data used in the rows would have values filled in.

A common useage for the newdata argument is when you want to make a table which combines summary statistics for subsets of data. Suppose we were to display the sepal measures for the entire dataset, then show these same measurements for two subsets of data which are determined by the petal length. Here, we divide the dataset into two subsets; petal length > 4.3 and petal length <= 4.3.

petal.small <- iris %>% filter(Petal.Length <= 4.3)
petal.large <- iris %>% filter(Petal.Length > 4.3)
tbl1 <- tbl_start(iris, "Species", missing=FALSE, overall=TRUE, comparison=FALSE) %>%
  empty_row(header="All Data") %>%
  num_row("Sepal.Length", rowlabels="     Sepal Length") %>%
  num_row("Sepal.Width", rowlabels="     Sepal Width") %>%
  empty_row(header="Petal Length less than 4.3") %>%
  num_row("Sepal.Length", rowlabels="     Sepal Length", col_var="Species", newdata=petal.small) %>%
  num_row("Sepal.Width", rowlabels="     Sepal Width", col_var="Species", newdata=petal.small) %>%
  empty_row(header="Petal Length greater than 4.3") %>%
  num_row("Sepal.Length", rowlabels="     Sepal Length", col_var="Species", newdata=petal.large) %>%
  num_row("Sepal.Width", rowlabels="     Sepal Width", col_var="Species", newdata=petal.large) %>%
  tbl_out() %>%
  print()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered"))
Variable Measure setosa versicolor virginica Overall
All Data
     Sepal Length min 4.30 4.90 4.90 4.30
Q1 4.80 5.60 6.30 5.10
median 5.00 5.90 6.50 5.80
Q3 5.20 6.30 6.95 6.40
max 5.80 7.00 7.90 7.90
mean 5.01 5.94 6.61 5.84
SD 0.35 0.52 0.64 0.83
     Sepal Width min 2.30 2.00 2.20 2.00
Q1 3.20 2.52 2.80 2.80
median 3.40 2.80 3.00 3.00
Q3 3.68 3.00 3.12 3.30
max 4.40 3.40 3.80 4.40
mean 3.43 2.77 2.96 3.06
SD 0.38 0.31 0.32 0.44
Petal Length less than 4.3
     Sepal Length min 4.30 4.90 4.30
Q1 4.80 5.50 4.90
median 5.00 5.60 5.10
Q3 5.20 5.80 5.55
max 5.80 6.40 6.40
mean 5.01 5.62 5.21
SD 0.35 0.37 0.46
     Sepal Width min 2.30 2.00 2.00
Q1 3.20 2.40 2.85
median 3.40 2.70 3.20
Q3 3.68 2.90 3.50
max 4.40 3.00 4.40
mean 3.43 2.63 3.16
SD 0.38 0.27 0.51
Petal Length greater than 4.3
     Sepal Length min 5.40 4.90 4.90
Q1 6.00 6.30 6.10
median 6.30 6.50 6.40
Q3 6.60 6.95 6.80
max 7.00 7.90 7.90
mean 6.26 6.61 6.49
SD 0.44 0.64 0.60
     Sepal Width min 2.20 2.20 2.20
Q1 2.80 2.80 2.80
median 3.00 3.00 3.00
Q3 3.10 3.12 3.10
max 3.40 3.80 3.80
mean 2.91 2.96 2.95
SD 0.29 0.32 0.31