The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
Heatmaps are a useful way to visualize the results of pairwise
significance testing. The work here starts with the package which was
designed for this purpose (factorplot?) and is similar in
spirit (though with a different purpose than Villacorta and Sáez (2015)). The
pwpm() function in the package produces a similar result,
but in matrix format rather than a heatmap visualization (Lenth 2025).
The mean idea here is that either the upper- or lower-triangle of a
matrix can be used to visualize statistics relating to the pair of
estimates identified by the row and column labels. With
factorplot(), the significance and direction of the
difference is encoded in color and the numerical values of the
difference (and optionally its standard error) are printed in each cell.
More in the spirit of Villacorta and Sáez
(2015), we could also encode the differences in color and convey
significance in a different way (e.g., with a text annotation within the
cell). The diagonal and alternate triangle could also be used to convey
information much as in the pairwise p-value matrix produce by
pwpm(). We demonstrate with a couple of examples.
First, we use the chickwts data predicting the weight of
chicks (weight) with feed type (feed). To make
the display easiest to read, we re-order the feed type factor by the
average weight, which will make the intervals decreasing in their
average. The first step is to estimate the model:
data(chickwts)
chickwts$feed <- reorder(chickwts$feed,
chickwts$weight,
FUN=mean)
chick_mod <- lm(weight~ feed, data=chickwts)First, we use the pwpm() function to show the matrix
visualization. The predicted values are on the diagonal, unadjusted
p-values are in the upper-triangle entries and differences between pairs
of estimates are in the lower-triangle. This would tell a reader
everything they might need to know about the pairwise differences, but
doesn’t scale especially well.
library(emmeans)
chick_emm <- emmeans(chick_mod, ~ feed)
pwpm(chick_emm,
by = NULL,
adjust = "none",
type = "response")
#> horsebean linseed soybean meatmeal casein sunflower
#> horsebean [160] 0.0152 0.0003 <.0001 <.0001 <.0001
#> linseed -58.55 [219] 0.2041 0.0135 <.0001 <.0001
#> soybean -86.23 -27.68 [246] 0.1726 0.0007 0.0003
#> meatmeal -116.71 -58.16 -30.48 [277] 0.0456 0.0264
#> casein -163.38 -104.83 -77.15 -46.67 [324] 0.8125
#> sunflower -168.72 -110.17 -82.49 -52.01 -5.33 [329]
#>
#> Row and column labels: feed
#> Upper triangle: P values
#> Diagonal: [Estimates] (emmean) type = "response"
#> Lower triangle: Comparisons (estimate) earlier vs. laterNext, we look at the default output from the
factorplot() function and its plotting method.
library(factorplot)
chick_fp <- factorplot(chick_mod, factor.var="feed", order="size")
plot(chick_fp)
Heatmap of Pairwise Differences in Chick Weights by Feed Type using
factorplot().
The default output from factorplot() is not amenable for
custom plotting. The output from the plot method is a
ggplot, so it can be adjusted to some degree with different
theme_ and scale_ functions, but there are
limits to the customizability with those functions. The function
fp_to_df() in the factorplot package makes a
data frame that can be plotted. The main argument aside from the input
object of class factorplot is the type
argument which can be "upper_tri", the default, which
indicates that all information should be provided so as to populate the
upper-triangle of the display. If changed to "bot_tri" then
the lower-triangle will contain the p-values and the upper-triangle the
differences with estimates themselves on the main diagonal. We show both
options below. First, the upper-triangle orientation.
chick_df1 <- fp_to_df(chick_fp)
head(chick_df1)
#> row column difference p_value estimate se
#> 1 casein casein NA NA 163.3833 NA
#> 2 casein horsebean -163.38333 2.067997e-09 NA 23.48549
#> 3 casein linseed -104.83333 1.493344e-05 NA 22.39254
#> 4 casein meatmeal -46.67424 4.556672e-02 NA 22.89580
#> 5 casein soybean -77.15476 6.654079e-04 NA 21.57799
#> 8 horsebean horsebean NA NA 0.0000 NAWe can then plot this with ggplot2 using the
row and column variables to identify the
matrix display. In this case, we will plot encode the difference with
color and use a flag in the cell to denote statistical significance.
ggplot(chick_df1, aes(x=row, y=column)) +
geom_tile(aes(fill = difference), color="black", linewidth=.25) +
scale_fill_viridis_c(option = "D", na.value = "transparent") +
geom_text(aes(label = ifelse(p_value < .05, "*", "")),
color = "white", size = 6) +
geom_text(data = filter(chick_df1, !is.na(estimate)), aes(label = sprintf("%.2f", estimate))) +
theme_classic() +
theme(axis.text.x = element_text(angle=45, hjust=1),
legend.position = "top") +
labs(x="", y="", fill="Difference")Custom Heatmap of Pairwise Differences in Chick Weights by Feed Type Encoding Difference with Color. White asterisks indicate statistical significance at the 0.05 level (two-tailed).
Alternatively, we could feed predicted values and a
variance-covariance matrix to factorplot() which would give
absolute predictions rather than relative ones for the estimates (though
the differences will stay the same).
library(marginaleffects)
chick_preds <- predictions(chick_mod, variables="feed", by="feed")
chick_b <- coef(chick_preds)
names(chick_b) <- chick_preds$feed
chick_fp2 <- factorplot(chick_b, var=vcov(chick_preds), order="size")
chick_df1b <- fp_to_df(chick_fp2)
head(chick_df1b)
#> row column difference p_value estimate se
#> 1 casein casein NA NA 323.5833 NA
#> 2 casein horsebean -163.38333 3.481436e-12 NA 23.48549
#> 3 casein linseed -104.83333 2.846173e-06 NA 22.39254
#> 4 casein meatmeal -46.67424 4.149494e-02 NA 22.89580
#> 5 casein soybean -77.15476 3.493940e-04 NA 21.57799
#> 8 horsebean horsebean NA NA 160.2000 NAggplot(chick_df1b, aes(x=row, y=column)) +
geom_tile(aes(fill = difference), color="black", linewidth=.25) +
scale_fill_viridis_c(option = "D", na.value = "transparent") +
geom_text(aes(label = ifelse(p_value < .05, "*", "")),
color = "white", size = 6) +
geom_text(data = filter(chick_df1b, !is.na(estimate)), aes(label = sprintf("%.2f", estimate))) +
theme_classic() +
theme(axis.text.x = element_text(angle=45, hjust=1),
legend.position = "top") +
labs(x="", y="", fill="Difference")Custom Heatmap of Pairwise Differences in Chick Weights by Feed Type Encoding Difference with Color, Absolute Estimates.White asterisks indicate statistical significance at the 0.05 level (two-tailed).
Using the package (Campitelli 2025), we
could plot both triangles in different color scales - one for
differences and one for p-values. To do this, we will need to use
type="both_tri" in the fp_to_df() function to
get all the necessary information.
chick_df2 <- fp_to_df(chick_fp2, type="both_tri")
head(chick_df2)
#> row column value type se p_value difference
#> 1 casein casein 323.5833333 estimate NA NA NA
#> 2 casein horsebean -163.3833333 difference NA NA -163.38333
#> 3 casein linseed -104.8333333 difference NA NA -104.83333
#> 4 casein meatmeal -46.6742424 difference NA NA -46.67424
#> 5 casein soybean -77.1547619 difference NA NA -77.15476
#> 6 casein sunflower 0.8117457 pval 22.39254 0.8117457 NA
#> estimate
#> 1 323.5833
#> 2 NA
#> 3 NA
#> 4 NA
#> 5 NA
#> 6 NAlibrary(ggnewscale)
chick_df2 <- chick_df2 %>%
mutate(row = factor(as.character(row), levels=rev(levels(chickwts$feed))),
column = factor(as.character(column), levels=levels(chickwts$feed)),
significant = as.factor(ifelse(p_value < .05, "Significant", "Not Significant")))
ggplot(chick_df2, aes(x=row, y=column)) +
geom_tile(fill="transparent", color="black", linewidth=.25) +
geom_tile(data= filter(chick_df2, type=="difference"),
aes(fill = difference), color="black", linewidth=.25) +
scale_fill_viridis_c(option = "D", na.value = "transparent") +
labs(fill = "Difference") +
new_scale_fill() +
geom_tile(data= filter(chick_df2, type=="pval"),
aes(fill = significant), color="black", linewidth=.25) +
scale_fill_manual(values=c("white", "gray50")) +
labs(fill="p-value", x="", y="") +
geom_text(data = filter(chick_df2, !is.na(estimate)), aes(label = sprintf("%.2f", estimate))) +
theme_classic() +
theme(legend.position = "top", legend.box="vertical") Custom Heatmap of Pairwise Differences in Chick Weights by Feed Type Encoding Difference and p-values with Color in Alternating Triangles.
Using the UCBAdmissions data, we can make a multi-panel
display, in this case two panels - one for men and one for women. First,
let’s turn the dataset into something amenable for modeling and then run
the model predicting admission rates by gender and department.
data(UCBAdmissions)
ucb_dat <- as.data.frame(UCBAdmissions) %>%
tidyr::pivot_wider(names_from = "Admit", values_from = "Freq") %>%
mutate(Dept = as.factor(Dept))
ucb_mod <- glm(cbind(Admitted, Rejected) ~ Gender * Dept,
data=ucb_dat, family=binomial)Here, we generate two different sets of predictions - one for males
and one for females. The workflow is to generate a vector of values of
the grouping variable (Gender here). Then loop through them
to generate predictions for each value, setting the value of the
grouping variable to the current value in the loop in the
variables argument. Next, loop through the predictions,
each time saving the estimates, naming them, saving the
variance-covariance matrix, calling factorplot() on the
results and then turning that into a data frame. In the end, we can put
all the data frames together with dplyr::bind_rows().
gen_list <- c("Male", "Female")
pred_list <- lapply(gen_list, \(g)predictions(ucb_mod,
variables = list("Dept" = levels(ucb_dat$Dept),
"Gender" = g),
by="Dept"))
names(pred_list) <- gen_list
df_list <- lapply(pred_list, \(x){
b <- coef(x)
names(b) <- x$Dept
v <- vcov(x)
fp <- factorplot(b, var=v)
fp_to_df(fp, type="upper_tri")
})
ucb_df <- bind_rows(df_list, .id="Gender")
ucb_df <- ucb_df %>%
mutate(across(c(row, column), ~as.factor(as.character(.x))))We can then make the plot just like above, but with a
facet_wrap(~Gender), which will separate the two
panels.
ggplot(ucb_df, aes(x=row, y=forcats::fct_rev(column), fill=difference)) +
geom_tile(color="black", linewidth=.25) +
geom_text(aes(label = ifelse(p_value < .05, "*", "")),
color = "white", size = 6) +
geom_text(data=filter(ucb_df, !is.na(estimate)),
aes(label = sprintf("%.2f", estimate))) +
scale_fill_viridis_c(na.value = "transparent") +
theme_classic() +
facet_wrap(~Gender, nrow=1) +
theme(legend.position="top") +
labs(x="", y="", fill="Difference")Heatmaps of Predicted Admission Rates to UCB by Gender and Department with Pairwise Differences Encoded with Color. White asterisks indicate statistical significance at the 0.05 level (two-tailed).
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.