The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
This walks through the algorithm for constructing a model from 2-D embedding data and visualising it alongside high-dimensional data. The process involves two major steps:
To begin, we preprocess the 2-D embedding and create hexagonal bins over the layout.
Next, we extract the centroid coordinates and standardised bin counts. These will be used to identify densely populated regions in the 2-D space.
## To extract all bin centroids with bin counts
df_bin_centroids <- extract_hexbin_centroids(centroids_data = all_centroids_df,
counts_data = counts_df)
benchmark_highdens <- 0
## To extract high-densed bins
model_2d <- df_bin_centroids |>
dplyr::filter(n_h > benchmark_highdens)
glimpse(model_2d)
#> Rows: 137
#> Columns: 5
#> $ h <int> 34, 35, 36, 37, 38, 41, 42, 49, 50, 51, 52, 53, 54, 55, 56, 57, 64…
#> $ c_x <dbl> 0.1502324, 0.2336432, 0.3170540, 0.4004648, 0.4838756, 0.7341080, …
#> $ c_y <dbl> 0.0287916, 0.0287916, 0.0287916, 0.0287916, 0.0287916, 0.0287916, …
#> $ n_h <dbl> 1, 10, 8, 8, 2, 2, 12, 11, 9, 4, 11, 12, 1, 6, 12, 12, 5, 10, 10, …
#> $ w_h <dbl> 0.001, 0.010, 0.008, 0.008, 0.002, 0.002, 0.012, 0.011, 0.009, 0.0…
We then triangulate the hexagon centroids to build a wireframe of neighborhood relationships.
## Wireframe
tr_object <- tri_bin_centroids(centroids_data = df_bin_centroids)
str(tr_object)
#> List of 2
#> $ trimesh_object:List of 11
#> ..$ n : int 300
#> ..$ x : num [1:300] -0.1 -0.0166 0.0668 0.1502 0.2336 ...
#> ..$ y : num [1:300] -0.116 -0.116 -0.116 -0.116 -0.116 ...
#> ..$ nt : int 548
#> ..$ trlist: int [1:548, 1:9] 1 17 31 16 16 3 33 17 17 46 ...
#> .. ..- attr(*, "dimnames")=List of 2
#> .. .. ..$ : NULL
#> .. .. ..$ : chr [1:9] "i1" "i2" "i3" "j1" ...
#> ..$ cclist: num [1:548, 1:5] -0.0583 -0.0166 -0.1417 -0.0583 -0.0166 ...
#> .. ..- attr(*, "dimnames")=List of 2
#> .. .. ..$ : NULL
#> .. .. ..$ : chr [1:5] "x" "y" "r" "area" ...
#> ..$ nchull: int 50
#> ..$ chull : int [1:50] 1 2 3 4 5 20 35 6 7 8 ...
#> ..$ narcs : int 847
#> ..$ arcs : int [1:847, 1:2] 2 16 1 2 17 16 31 32 16 17 ...
#> .. ..- attr(*, "dimnames")=List of 2
#> .. .. ..$ : NULL
#> .. .. ..$ : chr [1:2] "from" "to"
#> ..$ call : language tri.mesh(x = centroids_data[["c_x"]], y = centroids_data[["c_y"]])
#> ..- attr(*, "class")= chr "triSht"
#> $ n_h : num [1:300] 0 0 0 0 0 0 0 0 0 0 ...
Using the triangulation object, we generate edges between centroids. We retain only edges connecting densely populated bins.
trimesh_data <- gen_edges(tri_object = tr_object, a1 = hb_obj$a1) |>
dplyr::filter(from_count > benchmark_highdens,
to_count > benchmark_highdens)
## Update the edge indexes to start from 1
trimesh_data <- update_trimesh_index(trimesh_data)
glimpse(trimesh_data)
#> Rows: 346
#> Columns: 8
#> $ from <int> 2, 1, 8, 2, 17, 2, 17, 3, 27, 27, 18, 18, 9, 9, 10, 3, 28, …
#> $ to <int> 8, 2, 18, 3, 18, 9, 28, 9, 38, 28, 29, 19, 18, 10, 19, 4, 3…
#> $ x_from <dbl> 0.2336432, 0.1502324, 0.1919378, 0.2336432, 0.1502324, 0.23…
#> $ y_from <dbl> 0.0287916, 0.0287916, 0.1010275, 0.0287916, 0.1732633, 0.02…
#> $ x_to <dbl> 0.1919378, 0.2336432, 0.2336432, 0.3170540, 0.2336432, 0.27…
#> $ y_to <dbl> 0.1010275, 0.0287916, 0.1732633, 0.0287916, 0.1732633, 0.10…
#> $ from_count <dbl> 10, 1, 11, 10, 5, 10, 5, 8, 1, 1, 10, 10, 9, 9, 4, 8, 10, 1…
#> $ to_count <dbl> 11, 10, 10, 8, 10, 9, 10, 9, 11, 10, 5, 10, 10, 4, 10, 8, 5…
We begin by extracting the original data with their assigned hexagonal bin IDs.
nldr_df_with_hex_id <- hb_obj$data_hb_id
glimpse(nldr_df_with_hex_id)
#> Rows: 1,000
#> Columns: 4
#> $ emb1 <dbl> 0.27708147, 0.69717161, 0.77934921, 0.17323121, 0.21793445, 0.593…
#> $ emb2 <dbl> 0.91343544, 0.53767948, 0.39861033, 0.95285002, 0.98320848, 1.048…
#> $ ID <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19…
#> $ h <int> 216, 145, 116, 229, 229, 249, 79, 164, 206, 118, 147, 50, 80, 222…
We calculate the average high-dimensional coordinates for each bin and retain only the ones matching the 2-D model bins.
model_highd <- avg_highd_data(highd_data = scurve, scaled_nldr_hexid = nldr_df_with_hex_id)
model_highd <- model_highd |>
dplyr::filter(h %in% model_2d$h)
glimpse(model_highd)
#> Rows: 137
#> Columns: 8
#> $ h <int> 34, 35, 36, 37, 38, 41, 42, 49, 50, 51, 52, 53, 54, 55, 56, 57, 64,…
#> $ x1 <dbl> 0.95812568, 0.81844628, 0.54418139, 0.27895124, 0.05667053, -0.3754…
#> $ x2 <dbl> 0.0853925, 0.1158483, 0.1113382, 0.1282115, 0.1193769, 1.5840595, 1…
#> $ x3 <dbl> 1.286348, 1.563969, 1.833843, 1.954630, 1.997485, 1.914399, 1.82979…
#> $ x4 <dbl> 2.650223e-03, 1.842360e-03, -3.414032e-03, 8.798724e-05, 7.325453e-…
#> $ x5 <dbl> 0.0170739819, 0.0036132499, -0.0003026848, 0.0010398644, 0.00238201…
#> $ x6 <dbl> 0.087550004, -0.031822955, 0.019604954, -0.027633204, 0.083256963, …
#> $ x7 <dbl> -0.0024864689, -0.0037675385, 0.0000704427, -0.0002274276, -0.00165…
We now combine all components—high-dimensional data, the 2-D model, lifted high-dimensional centroids, and the triangulation—and render the model using an interactive tour.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.