Heatmaps for Pairwise Significance Testing

The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Dave Armstrong and William Poirier

2026-02-21

Heatmaps are a useful way to visualize the results of pairwise significance testing. The work here starts with the package which was designed for this purpose (factorplot?) and is similar in spirit (though with a different purpose than Villacorta and Sáez (2015)). The pwpm() function in the package produces a similar result, but in matrix format rather than a heatmap visualization (Lenth 2025).

The mean idea here is that either the upper- or lower-triangle of a matrix can be used to visualize statistics relating to the pair of estimates identified by the row and column labels. With factorplot(), the significance and direction of the difference is encoded in color and the numerical values of the difference (and optionally its standard error) are printed in each cell. More in the spirit of Villacorta and Sáez (2015), we could also encode the differences in color and convey significance in a different way (e.g., with a text annotation within the cell). The diagonal and alternate triangle could also be used to convey information much as in the pairwise p-value matrix produce by pwpm(). We demonstrate with a couple of examples.

Chick Weights

First, we use the chickwts data predicting the weight of chicks (weight) with feed type (feed). To make the display easiest to read, we re-order the feed type factor by the average weight, which will make the intervals decreasing in their average. The first step is to estimate the model:

data(chickwts)
chickwts$feed <- reorder(chickwts$feed, 
                         chickwts$weight, 
                         FUN=mean)
chick_mod <- lm(weight~ feed, data=chickwts)

First, we use the pwpm() function to show the matrix visualization. The predicted values are on the diagonal, unadjusted p-values are in the upper-triangle entries and differences between pairs of estimates are in the lower-triangle. This would tell a reader everything they might need to know about the pairwise differences, but doesn’t scale especially well.

library(emmeans)
chick_emm <- emmeans(chick_mod, ~ feed)
pwpm(chick_emm, 
     by = NULL, 
     adjust = "none", 
     type = "response")
#>           horsebean linseed soybean meatmeal  casein sunflower
#> horsebean     [160]  0.0152  0.0003   <.0001  <.0001    <.0001
#> linseed      -58.55   [219]  0.2041   0.0135  <.0001    <.0001
#> soybean      -86.23  -27.68   [246]   0.1726  0.0007    0.0003
#> meatmeal    -116.71  -58.16  -30.48    [277]  0.0456    0.0264
#> casein      -163.38 -104.83  -77.15   -46.67   [324]    0.8125
#> sunflower   -168.72 -110.17  -82.49   -52.01   -5.33     [329]
#> 
#> Row and column labels: feed
#> Upper triangle: P values 
#> Diagonal: [Estimates] (emmean)   type = "response"
#> Lower triangle: Comparisons (estimate)   earlier vs. later

Next, we look at the default output from the factorplot() function and its plotting method.

library(factorplot)
chick_fp <- factorplot(chick_mod, factor.var="feed", order="size")
plot(chick_fp)

Heatmap of Pairwise Differences in Chick Weights by Feed Type using factorplot().

The default output from factorplot() is not amenable for custom plotting. The output from the plot method is a ggplot, so it can be adjusted to some degree with different theme_ and scale_ functions, but there are limits to the customizability with those functions. The function fp_to_df() in the factorplot package makes a data frame that can be plotted. The main argument aside from the input object of class factorplot is the type argument which can be "upper_tri", the default, which indicates that all information should be provided so as to populate the upper-triangle of the display. If changed to "bot_tri" then the lower-triangle will contain the p-values and the upper-triangle the differences with estimates themselves on the main diagonal. We show both options below. First, the upper-triangle orientation.

chick_df1 <- fp_to_df(chick_fp) 
head(chick_df1)
#>         row    column difference      p_value estimate       se
#> 1    casein    casein         NA           NA 163.3833       NA
#> 2    casein horsebean -163.38333 2.067997e-09       NA 23.48549
#> 3    casein   linseed -104.83333 1.493344e-05       NA 22.39254
#> 4    casein  meatmeal  -46.67424 4.556672e-02       NA 22.89580
#> 5    casein   soybean  -77.15476 6.654079e-04       NA 21.57799
#> 8 horsebean horsebean         NA           NA   0.0000       NA

We can then plot this with ggplot2 using the row and column variables to identify the matrix display. In this case, we will plot encode the difference with color and use a flag in the cell to denote statistical significance.

ggplot(chick_df1, aes(x=row, y=column)) + 
  geom_tile(aes(fill = difference), color="black", linewidth=.25) + 
  scale_fill_viridis_c(option = "D", na.value = "transparent") + 
  geom_text(aes(label = ifelse(p_value < .05, "*", "")), 
            color = "white", size = 6) + 
  geom_text(data = filter(chick_df1, !is.na(estimate)), aes(label = sprintf("%.2f", estimate))) + 
  theme_classic() + 
  theme(axis.text.x = element_text(angle=45, hjust=1), 
        legend.position = "top") + 
  labs(x="", y="", fill="Difference")

Custom Heatmap of Pairwise Differences in Chick Weights by Feed Type Encoding Difference with Color. White asterisks indicate statistical significance at the 0.05 level (two-tailed).

Alternatively, we could feed predicted values and a variance-covariance matrix to factorplot() which would give absolute predictions rather than relative ones for the estimates (though the differences will stay the same).

library(marginaleffects)
chick_preds <- predictions(chick_mod, variables="feed", by="feed")
chick_b <- coef(chick_preds)
names(chick_b) <- chick_preds$feed
chick_fp2 <- factorplot(chick_b, var=vcov(chick_preds), order="size")
chick_df1b <- fp_to_df(chick_fp2) 
head(chick_df1b)
#>         row    column difference      p_value estimate       se
#> 1    casein    casein         NA           NA 323.5833       NA
#> 2    casein horsebean -163.38333 3.481401e-12       NA 23.48549
#> 3    casein   linseed -104.83333 2.846185e-06       NA 22.39254
#> 4    casein  meatmeal  -46.67424 4.149493e-02       NA 22.89580
#> 5    casein   soybean  -77.15476 3.493946e-04       NA 21.57799
#> 8 horsebean horsebean         NA           NA 160.2000       NA

ggplot(chick_df1b, aes(x=row, y=column)) + 
  geom_tile(aes(fill = difference), color="black", linewidth=.25) + 
  scale_fill_viridis_c(option = "D", na.value = "transparent") + 
  geom_text(aes(label = ifelse(p_value < .05, "*", "")), 
            color = "white", size = 6) + 
  geom_text(data = filter(chick_df1b, !is.na(estimate)), aes(label = sprintf("%.2f", estimate))) + 
  theme_classic() + 
  theme(axis.text.x = element_text(angle=45, hjust=1), 
        legend.position = "top") + 
  labs(x="", y="", fill="Difference")

Custom Heatmap of Pairwise Differences in Chick Weights by Feed Type Encoding Difference with Color, Absolute Estimates.White asterisks indicate statistical significance at the 0.05 level (two-tailed).

Using the package (Campitelli 2025), we could plot both triangles in different color scales - one for differences and one for p-values. To do this, we will need to use type="both_tri" in the fp_to_df() function to get all the necessary information.

chick_df2 <- fp_to_df(chick_fp2, type="both_tri")
head(chick_df2)
#>      row    column        value       type       se   p_value difference
#> 1 casein    casein  323.5833333   estimate       NA        NA         NA
#> 2 casein horsebean -163.3833333 difference       NA        NA -163.38333
#> 3 casein   linseed -104.8333333 difference       NA        NA -104.83333
#> 4 casein  meatmeal  -46.6742424 difference       NA        NA  -46.67424
#> 5 casein   soybean  -77.1547619 difference       NA        NA  -77.15476
#> 6 casein sunflower    0.8117457       pval 22.39253 0.8117457         NA
#>   estimate
#> 1 323.5833
#> 2       NA
#> 3       NA
#> 4       NA
#> 5       NA
#> 6       NA

library(ggnewscale)
chick_df2 <- chick_df2 %>% 
  mutate(row = factor(as.character(row), levels=rev(levels(chickwts$feed))), 
         column = factor(as.character(column), levels=levels(chickwts$feed)), 
         significant = as.factor(ifelse(p_value < .05, "Significant", "Not Significant")))
ggplot(chick_df2, aes(x=row, y=column)) + 
  geom_tile(fill="transparent", color="black", linewidth=.25) + 
  geom_tile(data= filter(chick_df2, type=="difference"), 
            aes(fill = difference), color="black", linewidth=.25) +
  scale_fill_viridis_c(option = "D", na.value = "transparent") + 
  labs(fill = "Difference") + 
  new_scale_fill() +
  geom_tile(data= filter(chick_df2, type=="pval"), 
            aes(fill = significant), color="black", linewidth=.25) +
  scale_fill_manual(values=c("white", "gray50")) + 
  labs(fill="p-value", x="", y="") + 
  geom_text(data = filter(chick_df2, !is.na(estimate)), aes(label = sprintf("%.2f", estimate))) +
  theme_classic() + 
  theme(legend.position = "top", legend.box="vertical")

Custom Heatmap of Pairwise Differences in Chick Weights by Feed Type Encoding Difference and p-values with Color in Alternating Triangles.

UCB Admissions

Using the UCBAdmissions data, we can make a multi-panel display, in this case two panels - one for men and one for women. First, let’s turn the dataset into something amenable for modeling and then run the model predicting admission rates by gender and department.

data(UCBAdmissions)

ucb_dat <- as.data.frame(UCBAdmissions) %>% 
  tidyr::pivot_wider(names_from = "Admit", values_from = "Freq") %>% 
  mutate(Dept = as.factor(Dept))

ucb_mod <- glm(cbind(Admitted, Rejected) ~ Gender * Dept, 
               data=ucb_dat, family=binomial)

Here, we generate two different sets of predictions - one for males and one for females. The workflow is to generate a vector of values of the grouping variable (Gender here). Then loop through them to generate predictions for each value, setting the value of the grouping variable to the current value in the loop in the variables argument. Next, loop through the predictions, each time saving the estimates, naming them, saving the variance-covariance matrix, calling factorplot() on the results and then turning that into a data frame. In the end, we can put all the data frames together with dplyr::bind_rows().

gen_list <- c("Male", "Female")
pred_list <- lapply(gen_list, \(g)predictions(ucb_mod, 
            variables = list("Dept" = levels(ucb_dat$Dept), 
                             "Gender" = g), 
            by="Dept"))
names(pred_list) <- gen_list
df_list <- lapply(pred_list, \(x){
  b <- coef(x)
  names(b) <- x$Dept
  v <- vcov(x)
  fp <- factorplot(b, var=v)
  fp_to_df(fp, type="upper_tri")
})
ucb_df <- bind_rows(df_list, .id="Gender")
ucb_df <- ucb_df %>% 
  mutate(across(c(row, column), ~as.factor(as.character(.x))))

We can then make the plot just like above, but with a facet_wrap(~Gender), which will separate the two panels.

ggplot(ucb_df, aes(x=row, y=forcats::fct_rev(column), fill=difference)) + 
  geom_tile(color="black", linewidth=.25) + 
  geom_text(aes(label = ifelse(p_value < .05, "*", "")), 
            color = "white", size = 6) +
  geom_text(data=filter(ucb_df, !is.na(estimate)), 
            aes(label = sprintf("%.2f", estimate))) +
  scale_fill_viridis_c(na.value = "transparent") + 
  theme_classic() + 
  facet_wrap(~Gender, nrow=1) + 
  theme(legend.position="top") + 
  labs(x="", y="", fill="Difference")

Heatmaps of Predicted Admission Rates to UCB by Gender and Department with Pairwise Differences Encoded with Color. White asterisks indicate statistical significance at the 0.05 level (two-tailed).

References

Campitelli, Elio. 2025. Ggnewscale: Multiple Fill and Colour Scales in ’Ggplot2’. https://doi.org/10.32614/CRAN.package.ggnewscale.

Lenth, Russell V. 2025. Emmeans: Estimated Marginal Means, Aka Least-Squares Means. https://doi.org/10.32614/CRAN.package.emmeans.

Villacorta, Pablo J., and José A. Sáez. 2015. “SRCS: Statistical Ranking Color Scheme for Visualizing Parameterized Multiple Pairwise Comparisons with R.” The R Journal 7 (2): 89–104. https://doi.org/10.32614/RJ-2015-023.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.