Pre-generate seeds with {SeedMaker}

The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Introduction

Reproducibility is a cornerstone of the scientific method and increasingly expected when it comes to computation. Many elements of statistical research require the use of random components, including trial simulation. Consequently, the practice of setting seeds is a critical prerequisite to ensuring the exact same sequence of random numbers is generated each time code is run. There is, however, much nuance to be considered when settings seeds, especially if multiple random components are utilized.

Simple Motivating Example

Suppose you need to generate a set of 10 virtual subject responses (VSRs) for each of 5 simulations. The undisciplined approach neglects to set a seed to ensure reproducibility on future runs:

library(SeedMaker)
library(dplyr)
library(tibble)
library(tidyr)
library(rlang)

n_sims <- 5
n_subjects <- 10
mu <- 25
sigma <- 5

params <- tibble(
  n_subjects = !!n_subjects,
  mu = !!mu,
  sigma = !!sigma
)

y_gen <- function(n_subjects, mu, sigma) {
  rnorm(n_subjects, mu, sigma)
}

vsrs1 <- params %>%
  bind_cols(tibble(sim = paste0("sim", 1:!!n_sims))) %>%
  relocate(sim) %>%
  rowwise() %>%
  mutate(y = list(y_gen(!!n_subjects, !!mu, !!sigma))) %>%
  ungroup()

str(vsrs1)
#> tibble [5 × 5] (S3: tbl_df/tbl/data.frame)
#>  $ sim       : chr [1:5] "sim1" "sim2" "sim3" "sim4" ...
#>  $ n_subjects: num [1:5] 10 10 10 10 10
#>  $ mu        : num [1:5] 25 25 25 25 25
#>  $ sigma     : num [1:5] 5 5 5 5 5
#>  $ y         :List of 5
#>   ..$ : num [1:10] 19.9 18.9 17.2 28.3 27.5 ...
#>   ..$ : num [1:10] 29.8 31 21.3 26.2 23.4 ...
#>   ..$ : num [1:10] 27.5 30.2 26.7 28.1 22.3 ...
#>   ..$ : num [1:10] 29.6 14.7 15.8 23.6 28.8 ...
#>   ..$ : num [1:10] 31.3 24.8 29.3 26.9 27 ...

pregenerate.R

When the code is executed again, it produces a different collection of VSRs:

vsrs1_save <- vsrs1

vsrs1 <- params %>%
  bind_cols(tibble(sim = paste0("sim", 1:!!n_sims))) %>%
  relocate(sim) %>%
  rowwise() %>%
  mutate(y = list(y_gen(!!n_subjects, !!mu, !!sigma))) %>%
  ungroup()

str(vsrs1)
#> tibble [5 × 5] (S3: tbl_df/tbl/data.frame)
#>  $ sim       : chr [1:5] "sim1" "sim2" "sim3" "sim4" ...
#>  $ n_subjects: num [1:5] 10 10 10 10 10
#>  $ mu        : num [1:5] 25 25 25 25 25
#>  $ sigma     : num [1:5] 5 5 5 5 5
#>  $ y         :List of 5
#>   ..$ : num [1:10] 22.3 38.8 22.3 13.8 28 ...
#>   ..$ : num [1:10] 21.9 19.3 29.6 20.5 29.4 ...
#>   ..$ : num [1:10] 21.4 26.7 22.9 24.6 17 ...
#>   ..$ : num [1:10] 30.1 30.6 26.9 22.8 23.2 ...
#>   ..$ : num [1:10] 29.9 31.9 20.8 27.8 20.5 ...

identical(vsrs1_save, vsrs1)
#> [1] FALSE

pregenerate.R

A disciplined approach sets a seed before generating the VSRs:

set.seed(1234)

vsrs2 <- params %>%
  bind_cols(tibble(sim = paste0("sim", 1:!!n_sims))) %>%
  relocate(sim) %>%
  rowwise() %>%
  mutate(y = list(y_gen(!!n_subjects, !!mu, !!sigma))) %>%
  ungroup()

str(vsrs2)
#> tibble [5 × 5] (S3: tbl_df/tbl/data.frame)
#>  $ sim       : chr [1:5] "sim1" "sim2" "sim3" "sim4" ...
#>  $ n_subjects: num [1:5] 10 10 10 10 10
#>  $ mu        : num [1:5] 25 25 25 25 25
#>  $ sigma     : num [1:5] 5 5 5 5 5
#>  $ y         :List of 5
#>   ..$ : num [1:10] 19 26.4 30.4 13.3 27.1 ...
#>   ..$ : num [1:10] 22.6 20 21.1 25.3 29.8 ...
#>   ..$ : num [1:10] 25.7 22.5 22.8 27.3 21.5 ...
#>   ..$ : num [1:10] 30.5 22.6 21.5 22.5 16.9 ...
#>   ..$ : num [1:10] 32.2 19.7 20.7 23.6 20 ...

pregenerate.R

When the code is executed again, it produces the same collection of VSRs:

vsrs2_save <- vsrs2

set.seed(1234)

vsrs2 <- params %>%
  bind_cols(tibble(sim = paste0("sim", 1:!!n_sims))) %>%
  relocate(sim) %>%
  rowwise() %>%
  mutate(y = list(y_gen(!!n_subjects, !!mu, !!sigma))) %>%
  ungroup()

str(vsrs2)
#> tibble [5 × 5] (S3: tbl_df/tbl/data.frame)
#>  $ sim       : chr [1:5] "sim1" "sim2" "sim3" "sim4" ...
#>  $ n_subjects: num [1:5] 10 10 10 10 10
#>  $ mu        : num [1:5] 25 25 25 25 25
#>  $ sigma     : num [1:5] 5 5 5 5 5
#>  $ y         :List of 5
#>   ..$ : num [1:10] 19 26.4 30.4 13.3 27.1 ...
#>   ..$ : num [1:10] 22.6 20 21.1 25.3 29.8 ...
#>   ..$ : num [1:10] 25.7 22.5 22.8 27.3 21.5 ...
#>   ..$ : num [1:10] 30.5 22.6 21.5 22.5 16.9 ...
#>   ..$ : num [1:10] 32.2 19.7 20.7 23.6 20 ...

identical(vsrs2_save, vsrs2)
#> [1] TRUE

pregenerate.R

Limitations

Although reproducibility has been attained, the drawback to this approach is that, because the inner workings of the pseudo random number generator are not readily available, there is not a good way to isolate and reproduce the set of VSRs for a single simulation only. For instance, to reproduce the sim5 values, sim1 - sim4 values must be reproduced as well. In this toy example, it is computationally inexpensive to do just that, but envision a more robust simulation project wherein hundreds of thousands of simulations are run, each of which generates thousands of VSRs. In all likelihood said VSRs are being modeled or summarized to answer the overarching question posed by the project. What happens if an error manifests when performing that analysis in simulation 100K? Do you really want to re-run 99K simulations in order to reproduce the VSR set for sim 100K for closer inspection?!

Solution

{SeedMaker} solves this problem by pre-generating a seed for each simulation (using the seed_maker() function) instead of relying on a single single seed for the entire collection¹:

seeds <- seed_maker(
  seed = 1234,
  level_names = "sim",
  n_per_level = n_sims
)

print(seeds)
#> $sim1
#> [1] 761680
#> 
#> $sim2
#> [1] 630678
#> 
#> $sim3
#> [1] 304108
#> 
#> $sim4
#> [1] 932745
#> 
#> $sim5
#> [1] 295846

vsrs3 <- params %>%
  bind_cols(enframe(seeds, name = "sim", value = "seed")) %>%
  relocate(sim) %>%
  unnest("seed") %>%
  rowwise() %>%
  mutate(y = list(exec(
    .fn = function(n_subjects, mu, sigma, seed) {
      set.seed(seed)
      y_gen(n_subjects, mu, sigma)
    },
    n_subjects = .data$n_subjects,
    mu = .data$mu,
    sigma = .data$sigma,
    seed = .data$seed
  ))) %>%
  ungroup()

str(vsrs3)
#> tibble [5 × 6] (S3: tbl_df/tbl/data.frame)
#>  $ sim       : chr [1:5] "sim1" "sim2" "sim3" "sim4" ...
#>  $ n_subjects: num [1:5] 10 10 10 10 10
#>  $ mu        : num [1:5] 25 25 25 25 25
#>  $ sigma     : num [1:5] 5 5 5 5 5
#>  $ seed      : int [1:5] 761680 630678 304108 932745 295846
#>  $ y         :List of 5
#>   ..$ : num [1:10] 37.7 27.8 26.7 24.5 33.2 ...
#>   ..$ : num [1:10] 26 29 28.4 17.7 27.9 ...
#>   ..$ : num [1:10] 22.9 24.7 15.9 25.8 21 ...
#>   ..$ : num [1:10] 23.2 22 29.2 34.2 24 ...
#>   ..$ : num [1:10] 25.5 19.4 29.4 28.4 25.6 ...

pregenerate.R

Now the seed associated with sim5 can be used to reproduce those VSRs directly:

set.seed(seeds[["sim5"]])

y_sim5 <- y_gen(n_subjects, mu, sigma)

print(y_sim5)
#>  [1] 25.52631 19.43905 29.40955 28.38848 25.55237 22.10995 29.75112 23.32666
#>  [9] 20.63027 29.75309

identical(
  vsrs3 %>%
    filter(sim == "sim5") %>%
    unnest(y) %>%
    pull(y),
  y_sim5
)
#> [1] TRUE

pregenerate.R

A Slightly More Complex Example

Now suppose the 10 VSRs (per simulation) are produced by first making a random assignment to one of two treatment arms, each of which is associated with a different true mu, and then generating a response based on said assignment. This can be accomplished with the (basic) set of seeds that were previously pre-generated with seed_maker():

n_treats <- 2
mu1 <- 25
mu2 <- 35

params_complex <- tibble(
  treat_id = seq_len(!!n_treats),
  mu = c(!!mu1, !!mu2),
  sigma = rep(!!sigma, !!n_treats)
)

vsrs4 <- params_complex %>%
  nest(params = everything()) %>%
  bind_cols(enframe(seeds, name = "sim", value = "seed")) %>%
  relocate(sim) %>%
  unnest("seed") %>%
  mutate(n_subjects = !!n_subjects) %>%
  relocate(n_subjects, .after = sim) %>%
  rowwise() %>%
  mutate(y = list(exec(
    .fn = function(n_subjects, params, seed) {
      set.seed(seed)
      vsrs <-
        tibble(
          treat_id = sapply(
            seq_len(n_subjects),
            function(x) sample(params %>% pull(treat_id), 1)
          )
        ) %>%
        left_join(
          params,
          by = "treat_id"
        ) %>%
        rowwise() %>%
        mutate(y = y_gen(1, mu, sigma)) %>%
        ungroup() %>%
        mutate(replicate = paste0("replicate", row_number())) %>%
        relocate(replicate)
    },
    n_subjects = .data$n_subjects,
    params = .data$params,
    seed = .data$seed
  ))) %>%
  ungroup() %>%
  select(- params)

str(vsrs4)
#> tibble [5 × 4] (S3: tbl_df/tbl/data.frame)
#>  $ sim       : chr [1:5] "sim1" "sim2" "sim3" "sim4" ...
#>  $ n_subjects: num [1:5] 10 10 10 10 10
#>  $ seed      : int [1:5] 761680 630678 304108 932745 295846
#>  $ y         :List of 5
#>   ..$ : tibble [10 × 5] (S3: tbl_df/tbl/data.frame)
#>   .. ..$ replicate: chr [1:10] "replicate1" "replicate2" "replicate3" "replicate4" ...
#>   .. ..$ treat_id : int [1:10] 2 2 1 2 1 1 2 2 2 2
#>   .. ..$ mu       : num [1:10] 35 35 25 35 25 25 35 35 35 35
#>   .. ..$ sigma    : num [1:10] 5 5 5 5 5 5 5 5 5 5
#>   .. ..$ y        : num [1:10] 33.5 44.2 28 38.7 24 ...
#>   ..$ : tibble [10 × 5] (S3: tbl_df/tbl/data.frame)
#>   .. ..$ replicate: chr [1:10] "replicate1" "replicate2" "replicate3" "replicate4" ...
#>   .. ..$ treat_id : int [1:10] 1 2 1 1 1 2 1 2 2 2
#>   .. ..$ mu       : num [1:10] 25 35 25 25 25 35 25 35 35 35
#>   .. ..$ sigma    : num [1:10] 5 5 5 5 5 5 5 5 5 5
#>   .. ..$ y        : num [1:10] 27.7 41.4 28.8 19 29.5 ...
#>   ..$ : tibble [10 × 5] (S3: tbl_df/tbl/data.frame)
#>   .. ..$ replicate: chr [1:10] "replicate1" "replicate2" "replicate3" "replicate4" ...
#>   .. ..$ treat_id : int [1:10] 2 1 2 2 2 1 2 1 1 1
#>   .. ..$ mu       : num [1:10] 35 25 35 35 35 25 35 25 25 25
#>   .. ..$ sigma    : num [1:10] 5 5 5 5 5 5 5 5 5 5
#>   .. ..$ y        : num [1:10] 38.7 31.9 31.4 24.8 44.1 ...
#>   ..$ : tibble [10 × 5] (S3: tbl_df/tbl/data.frame)
#>   .. ..$ replicate: chr [1:10] "replicate1" "replicate2" "replicate3" "replicate4" ...
#>   .. ..$ treat_id : int [1:10] 1 1 1 1 1 2 1 2 1 1
#>   .. ..$ mu       : num [1:10] 25 25 25 25 25 35 25 35 25 25
#>   .. ..$ sigma    : num [1:10] 5 5 5 5 5 5 5 5 5 5
#>   .. ..$ y        : num [1:10] 34.6 39 23.7 25.4 17.6 ...
#>   ..$ : tibble [10 × 5] (S3: tbl_df/tbl/data.frame)
#>   .. ..$ replicate: chr [1:10] "replicate1" "replicate2" "replicate3" "replicate4" ...
#>   .. ..$ treat_id : int [1:10] 1 2 1 2 2 1 1 1 1 1
#>   .. ..$ mu       : num [1:10] 25 35 25 35 35 25 25 25 25 25
#>   .. ..$ sigma    : num [1:10] 5 5 5 5 5 5 5 5 5 5
#>   .. ..$ y        : num [1:10] 22.1 39.8 23.3 30.6 39.8 ...

pregenerate.R

As before, the seed associated with sim5 can be used to reproduce those VSRs directly:

set.seed(seeds[["sim5"]])

y_complex_sim5 <-
  tibble(
    treat_id = sapply(
      seq_len(!!n_subjects),
      function(x) sample(params_complex %>% pull(treat_id), 1)
    )
  ) %>%
  left_join(
    params_complex,
    by = "treat_id"
  ) %>%
  rowwise() %>%
  mutate(y = y_gen(1, mu, sigma)) %>%
  ungroup() %>%
  mutate(replicate = paste0("replicate", row_number())) %>%
  select(replicate, treat_id, y)

print(y_complex_sim5)
#> # A tibble: 10 × 3
#>    replicate   treat_id     y
#>    <chr>          <int> <dbl>
#>  1 replicate1         1  22.1
#>  2 replicate2         2  39.8
#>  3 replicate3         1  23.3
#>  4 replicate4         2  30.6
#>  5 replicate5         2  39.8
#>  6 replicate6         1  33.1
#>  7 replicate7         1  17.6
#>  8 replicate8         1  27.0
#>  9 replicate9         1  27.8
#> 10 replicate10        1  36.3

identical(
  vsrs4 %>%
    filter(sim == "sim5") %>%
    unnest(y) %>%
    select(replicate, treat_id, y),
  y_complex_sim5
)
#> [1] TRUE

pregenerate.R

What if, however, you are interested in reproducing the value of just a single replicate? You are fundamentally in the same situation described earlier since the random treatment arm assignment is entangled with the random VSR generation. In other words, to reproduce the replicate10 value, replicate1 - replicate9 values must be reproduced as well. Fortunately, seed_maker() can be called in a manner that pre-generates an even more comprehensive set of seeds:

seeds_complex <- seed_maker(
  seed = 1234,
  level_names = c("sim", "replicate"),
  n_per_level = c(n_sims, n_subjects)
)

str(seeds_complex)
#> List of 5
#>  $ sim1:List of 10
#>   ..$ replicate1 : int 241906
#>   ..$ replicate2 : int 928350
#>   ..$ replicate3 : int 847527
#>   ..$ replicate4 : int 251548
#>   ..$ replicate5 : int 379402
#>   ..$ replicate6 : int 894769
#>   ..$ replicate7 : int 223940
#>   ..$ replicate8 : int 325385
#>   ..$ replicate9 : int 766829
#>   ..$ replicate10: int 904458
#>  $ sim2:List of 10
#>   ..$ replicate1 : int 548592
#>   ..$ replicate2 : int 681521
#>   ..$ replicate3 : int 273200
#>   ..$ replicate4 : int 819616
#>   ..$ replicate5 : int 493964
#>   ..$ replicate6 : int 997086
#>   ..$ replicate7 : int 271560
#>   ..$ replicate8 : int 284789
#>   ..$ replicate9 : int 676748
#>   ..$ replicate10: int 251192
#>  $ sim3:List of 10
#>   ..$ replicate1 : int 242055
#>   ..$ replicate2 : int 599300
#>   ..$ replicate3 : int 364599
#>   ..$ replicate4 : int 82239
#>   ..$ replicate5 : int 978159
#>   ..$ replicate6 : int 606003
#>   ..$ replicate7 : int 208137
#>   ..$ replicate8 : int 315461
#>   ..$ replicate9 : int 186198
#>   ..$ replicate10: int 549779
#>  $ sim4:List of 10
#>   ..$ replicate1 : int 542579
#>   ..$ replicate2 : int 802509
#>   ..$ replicate3 : int 411648
#>   ..$ replicate4 : int 945114
#>   ..$ replicate5 : int 842371
#>   ..$ replicate6 : int 967436
#>   ..$ replicate7 : int 686936
#>   ..$ replicate8 : int 386971
#>   ..$ replicate9 : int 866601
#>   ..$ replicate10: int 409145
#>  $ sim5:List of 10
#>   ..$ replicate1 : int 707912
#>   ..$ replicate2 : int 981104
#>   ..$ replicate3 : int 257273
#>   ..$ replicate4 : int 183803
#>   ..$ replicate5 : int 164537
#>   ..$ replicate6 : int 532250
#>   ..$ replicate7 : int 622868
#>   ..$ replicate8 : int 176299
#>   ..$ replicate9 : int 709384
#>   ..$ replicate10: int 166772

vsrs5 <- enframe(seeds_complex, name = "sim", value = "seeds") %>%
  rowwise() %>%
  mutate(int = list(enframe(seeds, name = "replicate", value = "seed"))) %>%
  ungroup() %>%
  unnest(int) %>%
  select(- seeds) %>%
  rowwise() %>%
  mutate(y = list(exec(
    .fn = function(seed) {
      set.seed(seed)
      vsrs <-
        tibble(
          treat_id = sample(!!params_complex %>% pull(treat_id), 1)
        )  %>%
        left_join(
          !!params_complex,
          by = "treat_id"
        ) %>%
        mutate(y = y_gen(1, mu, sigma))
    },
    seed = .data$seed
  ))) %>%
  ungroup() %>%
  unnest("y") %>%
  nest(y = c("replicate", "seed", "treat_id", "mu", "sigma", "y"), .by = "sim")

str(vsrs5)
#> tibble [5 × 2] (S3: tbl_df/tbl/data.frame)
#>  $ sim: chr [1:5] "sim1" "sim2" "sim3" "sim4" ...
#>  $ y  :List of 5
#>   ..$ : tibble [10 × 6] (S3: tbl_df/tbl/data.frame)
#>   .. ..$ replicate: chr [1:10] "replicate1" "replicate2" "replicate3" "replicate4" ...
#>   .. ..$ seed     :List of 10
#>   .. .. ..$ : int 241906
#>   .. .. ..$ : int 928350
#>   .. .. ..$ : int 847527
#>   .. .. ..$ : int 251548
#>   .. .. ..$ : int 379402
#>   .. .. ..$ : int 894769
#>   .. .. ..$ : int 223940
#>   .. .. ..$ : int 325385
#>   .. .. ..$ : int 766829
#>   .. .. ..$ : int 904458
#>   .. ..$ treat_id : int [1:10] 1 1 2 1 1 2 1 2 1 2
#>   .. ..$ mu       : num [1:10] 25 25 35 25 25 35 25 35 25 35
#>   .. ..$ sigma    : num [1:10] 5 5 5 5 5 5 5 5 5 5
#>   .. ..$ y        : num [1:10] 31 29 29.3 25.6 20.1 ...
#>   ..$ : tibble [10 × 6] (S3: tbl_df/tbl/data.frame)
#>   .. ..$ replicate: chr [1:10] "replicate1" "replicate2" "replicate3" "replicate4" ...
#>   .. ..$ seed     :List of 10
#>   .. .. ..$ : int 548592
#>   .. .. ..$ : int 681521
#>   .. .. ..$ : int 273200
#>   .. .. ..$ : int 819616
#>   .. .. ..$ : int 493964
#>   .. .. ..$ : int 997086
#>   .. .. ..$ : int 271560
#>   .. .. ..$ : int 284789
#>   .. .. ..$ : int 676748
#>   .. .. ..$ : int 251192
#>   .. ..$ treat_id : int [1:10] 2 1 2 2 2 1 2 2 2 2
#>   .. ..$ mu       : num [1:10] 35 25 35 35 35 25 35 35 35 35
#>   .. ..$ sigma    : num [1:10] 5 5 5 5 5 5 5 5 5 5
#>   .. ..$ y        : num [1:10] 36.3 25.5 43.4 33.6 30.8 ...
#>   ..$ : tibble [10 × 6] (S3: tbl_df/tbl/data.frame)
#>   .. ..$ replicate: chr [1:10] "replicate1" "replicate2" "replicate3" "replicate4" ...
#>   .. ..$ seed     :List of 10
#>   .. .. ..$ : int 242055
#>   .. .. ..$ : int 599300
#>   .. .. ..$ : int 364599
#>   .. .. ..$ : int 82239
#>   .. .. ..$ : int 978159
#>   .. .. ..$ : int 606003
#>   .. .. ..$ : int 208137
#>   .. .. ..$ : int 315461
#>   .. .. ..$ : int 186198
#>   .. .. ..$ : int 549779
#>   .. ..$ treat_id : int [1:10] 2 2 1 2 1 1 1 1 2 2
#>   .. ..$ mu       : num [1:10] 35 35 25 35 25 25 25 25 35 35
#>   .. ..$ sigma    : num [1:10] 5 5 5 5 5 5 5 5 5 5
#>   .. ..$ y        : num [1:10] 32.3 27.7 26 32.1 23.1 ...
#>   ..$ : tibble [10 × 6] (S3: tbl_df/tbl/data.frame)
#>   .. ..$ replicate: chr [1:10] "replicate1" "replicate2" "replicate3" "replicate4" ...
#>   .. ..$ seed     :List of 10
#>   .. .. ..$ : int 542579
#>   .. .. ..$ : int 802509
#>   .. .. ..$ : int 411648
#>   .. .. ..$ : int 945114
#>   .. .. ..$ : int 842371
#>   .. .. ..$ : int 967436
#>   .. .. ..$ : int 686936
#>   .. .. ..$ : int 386971
#>   .. .. ..$ : int 866601
#>   .. .. ..$ : int 409145
#>   .. ..$ treat_id : int [1:10] 1 1 2 1 2 2 2 1 2 1
#>   .. ..$ mu       : num [1:10] 25 25 35 25 35 35 35 25 35 25
#>   .. ..$ sigma    : num [1:10] 5 5 5 5 5 5 5 5 5 5
#>   .. ..$ y        : num [1:10] 17 25.3 29.7 20.6 44.1 ...
#>   ..$ : tibble [10 × 6] (S3: tbl_df/tbl/data.frame)
#>   .. ..$ replicate: chr [1:10] "replicate1" "replicate2" "replicate3" "replicate4" ...
#>   .. ..$ seed     :List of 10
#>   .. .. ..$ : int 707912
#>   .. .. ..$ : int 981104
#>   .. .. ..$ : int 257273
#>   .. .. ..$ : int 183803
#>   .. .. ..$ : int 164537
#>   .. .. ..$ : int 532250
#>   .. .. ..$ : int 622868
#>   .. .. ..$ : int 176299
#>   .. .. ..$ : int 709384
#>   .. .. ..$ : int 166772
#>   .. ..$ treat_id : int [1:10] 1 1 2 2 2 2 1 1 1 2
#>   .. ..$ mu       : num [1:10] 25 25 35 35 35 35 25 25 25 35
#>   .. ..$ sigma    : num [1:10] 5 5 5 5 5 5 5 5 5 5
#>   .. ..$ y        : num [1:10] 32.2 32.5 25.7 41.1 33.2 ...

pregenerate.R

This time, the seed associated with sim5, replicate10 can be used to reproduce that VSR directly:

set.seed(seeds_complex[["sim5"]][["replicate10"]])

y_complex_sep_sim5 <-
  tibble(
    treat_id = sample(params_complex %>% pull(treat_id), 1)
  )  %>%
  left_join(
    params_complex,
    by = "treat_id"
  ) %>%
  mutate(y = y_gen(1, mu, sigma)) %>%
  select(treat_id, y)

print(y_complex_sep_sim5)
#> # A tibble: 1 × 2
#>   treat_id     y
#>      <int> <dbl>
#> 1        2  27.2

identical(
  vsrs5 %>%
    filter(sim == "sim5") %>%
    unnest(y) %>%
    filter(replicate == "replicate10") %>%
    select(treat_id, y),
  y_complex_sep_sim5
)
#> [1] TRUE

pregenerate.R

Summary

Though it might seem like overkill, taking the time to consider all sources of randomness, pre-generating seeds for each, and then carefully incorporating said seeds into simulation/analysis pipelines has downstream benefits. Not only is reproducibility attained at the project level, but at the elemental level as well. This often proves to be extremely useful when balancing computational resource constraints versus the ability to drill down to a granular level when debugging/troubleshooting.

For completeness, it is acknowledged that an alternative approach to solving this problem is to retain all of the VSRs in external files. For modestly-sized projects, this might be the most straightforward way to proceed. Operationally speaking, however, doing so is not only unnecessary, but undesirable from both a file storage and I/O perspective. For larger projects in particular, it is preferable to utilize the VSRs in memory, retaining the (much smaller) modeled/summarized results only.↩︎

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.