The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

blockstrap

Sample complete groups (“blocks”) from a grouped data frame. This package implements a simple block bootstrap style sampler: instead of sampling individual rows, you sample entire groups preserving the intra-group structure.

Installation

You can install the release version from CRAN:

install.packages("blockstrap")

or the development version

# install.packages("remotes")
remotes::install_github("numbats/blockstrap")

Motivation

When observations belonging to the same experimental unit are spread across multiple rows (e.g. multiple measurements per subject, dose combinations, time series segments), ordinary row-wise sampling breaks these units apart. A block sampler keeps units intact by sampling at the group level.

Core function

slice_block() works on a grouped data frame. If you call it on an ungrouped data frame, it throws a helpful error.

Key arguments:

Basic example

We use the built-in ToothGrowth dataset and treat each supplement-dose combination as a block.

library(dplyr)
library(blockstrap)

set.seed(1)
ToothGrowth |>
  group_by(supp, dose) |>
  slice_block(n = 2)
## # A tibble: 20 × 3
## # Groups:   supp, dose [2]
##      len supp   dose
##    <dbl> <fct> <dbl>
##  1  15.2 OJ      0.5
##  2  21.5 OJ      0.5
##  3  17.6 OJ      0.5
##  4   9.7 OJ      0.5
##  5  14.5 OJ      0.5
##  6  10   OJ      0.5
##  7   8.2 OJ      0.5
##  8   9.4 OJ      0.5
##  9  16.5 OJ      0.5
## 10   9.7 OJ      0.5
## 11   4.2 VC      0.5
## 12  11.5 VC      0.5
## 13   7.3 VC      0.5
## 14   5.8 VC      0.5
## 15   6.4 VC      0.5
## 16  10   VC      0.5
## 17  11.2 VC      0.5
## 18  11.2 VC      0.5
## 19   5.2 VC      0.5
## 20   7   VC      0.5

Sampling with replacement

If you want to sample more groups than exist, or allow repeats:

ToothGrowth |>
  group_by(supp, dose) |>
  slice_block(n = 10, replace = TRUE) |>
  count(supp, dose)
## # A tibble: 5 × 3
## # Groups:   supp, dose [5]
##   supp   dose     n
##   <fct> <dbl> <int>
## 1 OJ      0.5    20
## 2 OJ      1      20
## 3 OJ      2      30
## 4 VC      1      20
## 5 VC      2      10

Repeated blocks will appear multiple times (row counts summed accordingly).

Weighted sampling

Weight blocks by a statistic, e.g. mean tooth length, to favor larger mean response groups:

set.seed(42)
weighted <- ToothGrowth |>
  group_by(supp, dose) |>
  slice_block(n = 3, weight_by = mean(len))

You can verify weighting bias by repeating and tallying frequencies:

set.seed(99)
rep_draws <- replicate(500, {
  ToothGrowth |> group_by(supp, dose) |> slice_block(n = 3, weight_by = mean(len)) |> distinct(supp, dose)
}, simplify = FALSE)

freqs <- bind_rows(rep_draws) |> count(supp, dose, name = "times") |> arrange(desc(times))
freqs
## # A tibble: 6 × 3
## # Groups:   supp, dose [6]
##   supp   dose times
##   <fct> <dbl> <int>
## 1 OJ      2     334
## 2 VC      2     326
## 3 OJ      1     292
## 4 VC      1     224
## 5 OJ      0.5   205
## 6 VC      0.5   119

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.