FastRerandomize Package Tutorial

The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

2025-01-13

Introduction

This vignette demonstrates how to use the fastrerandomize package for generating and testing experimental designs in an agricultural study setting based on:

Burchardi, K.B., Gulesci, S., Lerva, B. and Sulaiman, M., 2019. Moral hazard: Experimental Evidence from Tenancy Contracts. The Quarterly Journal of Economics, 134(1), pp.281-347.

We will walk through:

Generating synthetic covariate data
Creating candidate randomizations
Exploring the S3 methods for printing, summarizing, and plotting randomization objects
Conducting a randomization-based inference test

Note: This tutorial assumes you have installed the fastrerandomize package, either from source or via GitHub:

# If you haven't installed or set up the package:
# devtools::install_github("cjerzak/fastrerandomize-software/fastrerandomize")

# (Done once) Optionally build the JAX backend if needed
# fastrerandomize::build_backend()

# note that this vignette can be found like so: 
# vignette(package = "fastrerandomize")

1. Obtain Pre-treatment Covariate Data

For illustration, we use several pre-treatment covariates from the original experiment.

library(fastrerandomize)

# Obtain pre-treatment covariates 
data(QJEData, package = "fastrerandomize")
myCovariates <- c("children","married","hh_size","hh_sexrat")
QJEData <- QJEData[!is.na(rowSums(QJEData[,myCovariates])),]
X <- QJEData[,myCovariates]
  
# Analysis parameters
n_units   <- nrow(X)
n_treated <- round(nrow(X)/2)

2. Generate Randomizations

We now create our candidate randomizations using the function generate_randomizations() from fastrerandomize. Depending on the scale and complexity of your experiment, you may choose either exact enumeration (“exact”) or Monte Carlo (“monte_carlo”) randomization.

In this example, we’ll use the exact approach, but note that randomization_accept_prob is set very low (0.0001) for demonstration, so that we can see some filtering in action.

RunMainAnalysis <- (!is.null(check_jax_availability()) )
# if FALSE, consider calling build_backend()

if(RunMainAnalysis){
CandRandomizations <- generate_randomizations(
  n_units = n_units,
  n_treated = n_treated,
  X = X,
  randomization_type = "monte_carlo", 
  max_draws = 100000L,
  batch_size = 1000L,
  randomization_accept_prob = 0.0001
)
}

## Using monte carlo randomization

## Starting batch processing with 100 batches.

## On batch 1 of 100

## On batch 2 of 100

## On batch 3 of 100

## On batch 4 of 100

## On batch 5 of 100

## On batch 6 of 100

## On batch 7 of 100

## On batch 8 of 100

## On batch 9 of 100

## On batch 10 of 100

## On batch 11 of 100

## On batch 12 of 100

## On batch 13 of 100

## On batch 14 of 100

## On batch 15 of 100

## On batch 16 of 100

## On batch 17 of 100

## On batch 18 of 100

## On batch 19 of 100

## On batch 20 of 100

## On batch 21 of 100

## On batch 22 of 100

## On batch 23 of 100

## On batch 24 of 100

## On batch 25 of 100

## On batch 26 of 100

## On batch 27 of 100

## On batch 28 of 100

## On batch 29 of 100

## On batch 30 of 100

## On batch 31 of 100

## On batch 32 of 100

## On batch 33 of 100

## On batch 34 of 100

## On batch 35 of 100

## On batch 36 of 100

## On batch 37 of 100

## On batch 38 of 100

## On batch 39 of 100

## On batch 40 of 100

## On batch 41 of 100

## On batch 42 of 100

## On batch 43 of 100

## On batch 44 of 100

## On batch 45 of 100

## On batch 46 of 100

## On batch 47 of 100

## On batch 48 of 100

## On batch 49 of 100

## On batch 50 of 100

## On batch 51 of 100

## On batch 52 of 100

## On batch 53 of 100

## On batch 54 of 100

## On batch 55 of 100

## On batch 56 of 100

## On batch 57 of 100

## On batch 58 of 100

## On batch 59 of 100

## On batch 60 of 100

## On batch 61 of 100

## On batch 62 of 100

## On batch 63 of 100

## On batch 64 of 100

## On batch 65 of 100

## On batch 66 of 100

## On batch 67 of 100

## On batch 68 of 100

## On batch 69 of 100

## On batch 70 of 100

## On batch 71 of 100

## On batch 72 of 100

## On batch 73 of 100

## On batch 74 of 100

## On batch 75 of 100

## On batch 76 of 100

## On batch 77 of 100

## On batch 78 of 100

## On batch 79 of 100

## On batch 80 of 100

## On batch 81 of 100

## On batch 82 of 100

## On batch 83 of 100

## On batch 84 of 100

## On batch 85 of 100

## On batch 86 of 100

## On batch 87 of 100

## On batch 88 of 100

## On batch 89 of 100

## On batch 90 of 100

## On batch 91 of 100

## On batch 92 of 100

## On batch 93 of 100

## On batch 94 of 100

## On batch 95 of 100

## On batch 96 of 100

## On batch 97 of 100

## On batch 98 of 100

## On batch 99 of 100

## On batch 100 of 100

## MC Loop Time (s): 1.0135

4. S3 Methods: Print, Summary, and Plot

generate_randomizations() returns an S3 object of class fastrerandomize_randomizations. We can use its associated print, summary, and plot methods to inspect the candidate randomizations.

if(RunMainAnalysis){
# 4a. Print the object
print(CandRandomizations)

# 4b. Summary
summary(CandRandomizations)

# 4c. Plot the balance distribution
plot(CandRandomizations)
}

## Object of class 'fastrerandomize_randomizations'
## 
## Call:
##    generate_randomizations(n_units = n_units, n_treated = n_treated, 
##       X = X, randomization_accept_prob = 1e-04, max_draws = 100000L, 
##       batch_size = 1000L, randomization_type = "monte_carlo") 
## 
## Number of candidate randomizations: 10 
## Number of units (columns): 302 
## Balance statistics available for each randomization.
## Summary of 'fastrerandomize_randomizations' object:
## 
## Balance statistics (summary):
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.00190 0.01989 0.02277 0.02136 0.02449 0.03175

3. Randomization Test

Next, we showcase a randomization-based inference procedure using fastrerandomize.

3a. Setup Simulated Outcomes

We simulate an outcome Y that depends on our covariates X, the chosen randomization, and some true treatment effect τ. Here, we pick the first acceptable randomization from our set as our “observed” assignment, Wobs.

if(RunMainAnalysis){
CoefY <- rnorm(ncol(X))          # Coefficients for outcome model
Wobs  <- CandRandomizations$randomizations[1, ]  # use the 1st acceptable randomization
tau_true <- 1                     # true treatment effect
# Generate Y = X * Coef + W*tau + noise
Yobs <- as.numeric( as.matrix(X) %*% as.matrix(CoefY) + Wobs * tau_true + rnorm(n_units, sd = 0.1))
}

3b. Run the Randomization Test

We pass our observed treatment assignment (obsW), observed outcomes (obsY), and the full matrix of candidate randomizations (candidate_randomizations) to randomization_test(). This test:

Computes the observed difference in means
Permutes the treatment assignment across our candidate randomizations
Estimates the p-value by comparing the observed statistic to its permutation distribution

if(RunMainAnalysis){
randomization_test_results <- randomization_test(
  obsW = Wobs,
  obsY = Yobs,
  candidate_randomizations = CandRandomizations$randomizations,
  findFI = TRUE
)

# Inspect results
print(randomization_test_results)
summary(randomization_test_results)
plot(randomization_test_results)
}

## Object of class 'fastrerandomize_test'
## 
## Call:
##    randomization_test(obsW = Wobs, obsY = Yobs, candidate_randomizations = CandRandomizations$randomizations, 
##       findFI = TRUE) 
## 
## P-value:  0.1 
## Observed effect (tau_obs):  0.988699 
## Fiducial interval: [-491.1697, 1015.033]
## Summary of 'fastrerandomize_test' object:
## 
## Call:
##    randomization_test(obsW = Wobs, obsY = Yobs, candidate_randomizations = CandRandomizations$randomizations, 
##       findFI = TRUE) 
## 
## P-value:
##    0.1 
## 
## Observed effect (tau_obs):
##    0.988699 
## 
## Fiducial Interval:
##   [-491.1697, 1015.033]

The findFI = TRUE flag further attempts to locate a fiducial interval (FI) for the treatment effect.

Conclusion

This tutorial demonstrated a basic workflow with fastrerandomize:

Generating synthetic or real covariate data
Producing a set of acceptable treatment assignments via randomization
Visualizing and summarizing these assignments
Testing for treatment effects with a randomization-based approach

We encourage you to consult the package documentation for more advanced functionalities, including GPU-accelerated computations via JAX and flexible definitions of balance criteria in addition to what was shown here.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.