The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

6. Selecting the optimal bin width

library(quollr)
library(dplyr)
library(ggplot2)

set.seed(20240110)

We demonstrate how to identify the optimal bin width for hexagonal binning in the 2-D embedding space. Selecting an appropriate bin width is crucial for balancing model complexity and prediction accuracy when comparing structures between high-dimensional data and their 2-D layout.

We begin by computing model errors across a range of bin width values using the gen_diffbin1_errors() function. This function fits models for multiple bin widths and returns root mean squared error (RMSE) values for each configuration.


error_df_all <- gen_diffbin1_errors(highd_data = scurve, 
                                    nldr_data = scurve_umap)

error_df_all <- error_df_all |>
  mutate(a1 = round(a1, 2)) |>
  filter(b1 >= 5) |>
  group_by(a1) |>
  filter(RMSE == min(RMSE)) |>
  ungroup()

We round the bin width values (a1), filter for sufficient bin resolution (b1 >= 5), and select the configuration with the lowest RMSE for each unique bin width.

error_df_all |>
  arrange(-a1) |>
  head(5)
#> # A tibble: 5 × 9
#>   Error  RMSE    b1    b2     b     m    a1    a2  d_bar
#>   <dbl> <dbl> <int> <dbl> <dbl> <int> <dbl> <dbl>  <dbl>
#> 1  629. 0.410     5     7    35    22  0.26  0.23 0.0409
#> 2  563. 0.367     6     8    48    27  0.23  0.2  0.0369
#> 3  520. 0.336     7     9    63    33  0.2   0.17 0.0323
#> 4  430. 0.272     8    11    88    46  0.16  0.14 0.0260
#> 5  407. 0.254     9    12   108    51  0.14  0.12 0.0255

The plot below shows the relationship between bin width (a1) and RMSE. The goal is to identify a bin width that minimizes RMSE while avoiding overly coarse or fine binning.

ggplot(error_df_all,
         aes(x = a1,
             y = RMSE)) +
    geom_point(size = 0.8) +
    geom_line(linewidth = 0.3) +
    ylab("RMSE") +
    xlab(expression(paste("binwidth (", a[1], ")"))) +
    theme_minimal() +
    theme(panel.border = element_rect(fill = 'transparent'),
          plot.title = element_text(size = 12, hjust = 0.5, vjust = -0.5),
          axis.ticks.x = element_line(),
          axis.ticks.y = element_line(),
          legend.position = "none",
          axis.text.x = element_text(size = 7),
          axis.text.y = element_text(size = 7),
          axis.title.x = element_text(size = 7),
          axis.title.y = element_text(size = 7))

RMSE Vs binw idths.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.