The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

cvcrand package

Hengshi Yu, Fan Li, John A. Gallis and Elizabeth L. Turner

2023-09-16

Overview

cvcrand is an R package for the design and analysis of cluster randomized trials (CRTs).

Given the baseline values of some cluster-level covariates, users can perform a constrained randomization on the clusters into two arms, with an optional input of user-defined weights on the covariates.

At the end of the study, the individual outcome is collected. The cvcrand package also performs clustered permutation test on either continuous outcome or binary outcome adjusted for some individual-level covariates, producing p-value of the intervention effect.

Design: covariate-constrained randomization

In the design of CRTs with two arms, users can use the cvrall() function to perform covariate-constrained randomization or cvrcov() function to perform covariate-by-covariate constrained randomization. And for the analysis part, user would use the cptest() function for clustered permutation test.

A cluster is the unit of randomization for a cluster randomized trial. Thus, when the number of clusters is small, there might be some baseline imbalance from the randomization between the arms. Constrained randomization constrained the randomization space to randomization schemes with smaller difference among the covariates between the two arms.

The balance score for constrained randomization in the program is developed from (Raab and Butcher 2001). Suppose \(n\), \(n_T\), and \(n_C\) are the total number of clusters, the number of clusters in the treatment arm and the control arm respectively. Suppose also that there are \(K\) cluster-level variables including the continuous covariates as well as the dummy variables created from the categorical covariates. \(x_{ik}\) is the \(k\)th covariate (\(k=1,\ldots,K\)) of cluster \(i\). \(\bar{x}_{Tk}=\sum_{i=1}^{n_T}x_{ik}/n_T\) and \(\bar{x}_{Ck}=\sum_{i=n_T+1}^{n}x_{ik}/n_C\) are the means of the \(k\)th cluster-level variable in the treatment arm and the control arm, respectively, and \(\omega_{k}\) is a pre-determined weight for the \(k\)th variable. We choose \(\omega_{k}\) to be the inverse of the variance of the \(k\)th variable across all clusters following (Raab and Butcher 2001) and (Fan Li et al. 2016), namely \(\omega_k={1}/{s_k^2}=\frac{n-1}{\sum_{i=1}^n(x_{ik}-\bar{x}_k)^2}\) with \(\bar{x}_k=\sum_{i=1}^nx_{ik}/n\).

There are two choices of metric for the balance score. The balance score of the "l2" metric is defined as \(B_{(l2)}=\sum_{k=1}^{K}\omega_k(\bar{x}_{Tk}-\bar{x}_{Ck})^2\). And if "l1" metric from (F. Li et al. 2017) is specified, the balance score is defined as \(B_{(l1)}=\sum_{k=1}^{K}\tilde{\omega}_k\left|\bar{x}_{Tk}-\bar{x}_{Ck}\right|\), where \(\tilde{\omega}_k\) is chosen to be the inverse of the standard deviation of the \(k\)th variable \(s_k\).

To reflect the relative importance of different baseline covariates, one may include user-defined weights in the "l1" and "l2" balance metrics. The "l2" balance metric is set to be \(B_{(l2)}=\sum_{k=1}^{K}d_k\omega_k(\bar{x}_{Tk}-\bar{x}_{Ck})^2\), where \(d_k\) is the user-defined weight for the \(k\)th variable. By default, \(d_k=1\) for all variables. A large user-defined weight \(d_k>1\) could be assigned to a variable of importance when assessing the balance scores. Similarly, we modify the \(l1\) balance metric by allowing for user-defined weight as \(B_{(l1)}=\sum_{k=1}^{K}d_k\tilde{\omega}_k\left|\bar{x}_{Tk}-\bar{x}_{Ck}\right|\).

With the baseline values of the specified cluster-level covariates in a cluster randomized trail, the cvrall() function in the cvcrand package is used to perform the covariate-constrained randomization.

Each categorical variable is transformed into dummy variables to calculate the balance score. When transforming a categorical variable to dummy variables, the reference level will be dropped if the categorical variable is specified as a factor. Otherwise, the first level in the alphanumerical order will be dropped. Users can also specify a certain reference level for each categorical variable by manually coding dummy variables before running the cvrall() function. In addition to constraining the randomization space via a scalar summary score, we developed the cvrcov() function to implement constrained randomization with baseline balance defined directly through each covariate.

We followed the routine developed by (Greene 2017) to give covariate-by-covariate constraints based on arm mean difference or arm total difference. For each covariate to be considered for constrained randomization, a specific constraint is to be specified by users. The constraint of "any" means no constraints. If not "any", the first character letter of "m" denotes absolute mean difference, and "s" means absolute sum difference. If the second character is "f", the previous metric is constrained to be smaller or equal to the fraction with the number followed of the overall mean for "m" or mean arm total for "s". If not "f" at the second character, the metric is just constrained to be smaller or equal to the value followed.

To check the randomization validity ((Bailey and Rowley 1987)), the argument of check_validity in both cvrall() and cvrcov() functions could be specified to be TRUE. Then the functions would provide summary statistics on cluster pairs that always or never appear together in the same arm, which might imply the validity of randomization.

cvrall() example for covariate-constrained randomization

A study presented by (Dickinson et al. 2015) is about two approaches (interventions) for increasing the “up-to-date” immunization rate in 19- to 35-month-old children. They planned to randomize 16 counties in Colorado 1:1 to either a population-based approach or a practice-based approach. There are several county-level variables. The program will randomize on a subset of these variables. The continuous variable of average income is categorized to illustrate the use of the cvrall() on multi-category variables. The percentage in Colorado Immunization Information System (CIIS) variable is trancated at 100%.

county location inciis numberofchildrenages1935months uptodateonimmunizations africanamerican
1 Rural 94 366 37 2
2 Rural 85 1274 39 0
3 Rural 85 614 42 5
4 Rural 93 1720 39 1
5 Rural 82 242 31 1
6 Rural 80 350 27 3
7 Rural 94 401 49 1
8 Rural 100 234 37 1
9 Urban 93 3779 51 4
10 Urban 89 11807 51 10
11 Urban 83 9453 54 2
12 Urban 70 12354 29 8
13 Urban 93 10008 50 2
14 Urban 85 5343 36 2
15 Urban 82 3143 38 3
16 Urban 84 6056 43 1
hispanic pediatricpracticetofamilymedicin communityhealthcenters incomecat income
44 1.00 1 Low 35988
23 0.08 0 High 67565
12 0.33 3 Low 35879
18 0.33 6 High 63617
6 0.20 0 High 59118
15 0.00 3 Med 57179
38 0.20 3 Low 29738
39 0.00 1 Low 37350
35 0.15 11 Med 52923
17 0.45 6 Med 58302
7 0.61 1 High 93819
13 0.26 10 Med 54839
13 0.34 3 High 63857
10 0.18 7 Med 53502
39 0.27 7 Low 39570
28 0.10 8 Med 52457

For the covariate-constrained randomization, we used the cvrall() function to randomize 8 out of the 16 counties into the practice-based. For the definition of the whole randomization space, if the total number of all possible schemes is smaller than 50,000, we enumerate all the schemes as the whole randomization space. Otherwise, we simulate 50,000 schemes and choose the unique shemes among them as the whole randomization space. We calculate the balance scores of "l2" metric on three continuous covariates as well as two categorical covariates of location and income category. Location has "Rural" and "Urban". The level of "Rural" was then dropped in cvrall(). As income category has three levels of "low", "med", and "high", the level of "high" was dropped to create dummy variables according to the alphanumerical order as well. Then we constrained the randomization space to the schemes with "l2" balance scores less than the 0.1 quantile of that in the whole randomization space. Finally, a randomization scheme is sampled from the constrained space.

We saved the constrained randomization space in a CSV file in "dickinson_constrained.csv", the first column of which is an indicator variable of the finally selected scheme (1) or not (0). We also saved the balance scores of the whole randomization space in a CSV file in "dickinson_bscores.csv", and output a histogram displaying the distribution of all balance scores with a red line indicating our selected cutoff (the 0.1 quantile).

 Design_result <- cvrall(clustername = Dickinson_design$county,
                  balancemetric = "l2",
                  x = data.frame(Dickinson_design[ , c("location", "inciis",
                      "uptodateonimmunizations", "hispanic", "incomecat")]),
                  ntotal_cluster = 16,
                  ntrt_cluster = 8,
                  categorical = c("location", "incomecat"),
                  ###### Option to save the constrained space #####
                  # savedata = "dickinson_constrained.csv",
                  bhist = TRUE,
                  cutoff = 0.1,
                  seed = 12345)

The we had the following output:

 # the balance metric used
 Design_result$balancemetric
## [1] "l2"
 # the allocation scheme from constrained randomization
 Design_result$allocation
##    clustername allocation
## 1            1          1
## 2            2          1
## 3            3          1
## 4            4          0
## 5            5          0
## 6            6          0
## 7            7          0
## 8            8          1
## 9            9          0
## 10          10          1
## 11          11          1
## 12          12          1
## 13          13          0
## 14          14          1
## 15          15          0
## 16          16          0
 # the histogram of the balance score with respect to the balance metric
 Design_result$bscores
##                                   
## 1  score (selected scheme)   2.684
## 2             cutoff score   7.638
## 3                     Mean  24.000
## 4                       SD  15.775
## 5                      Min   1.161
## 6                       5%   5.826
## 7                      10%   7.638
## 8                      20%  10.849
## 9                      25%  12.221
## 10                     30%  13.840
## 11                     50%  20.578
## 12                     75%  31.621
## 13                     95%  55.486
## 14                     Max 116.656
 # the statement about how many clusters to be randomized to the intervention and the control arms respectively
 Design_result$assignment_message
## [1] "You have indicated that you want to assign 8 clusters to treatment and 8 to control"
 # the statement about how to get the whole randomization space to use in constrained randomization
 Design_result$scheme_message
## [1] "Enumerating all the 12870 schemes for 8 clusters in the treatment arm out of 16 clusters in total"
 # the statement about the cutoff in the constrained space
 Design_result$cutoff_message
## [1] "The quantile cutoff value is 0.1 based on the l2 balance metric, the cutoff balance score is 7.638"
 # the statement about the selected scheme from constrained randomization
 Design_result$choice_message
## [1] "Balance score of selected scheme by l2 is 2.684"
 # the data frame containing the allocation scheme, the clustername as well as the original data frame of covariates
 Design_result$data_CR
##    arm clustername location inciis uptodateonimmunizations hispanic incomecat
## 1    1           1    Rural     94                      37       44       Low
## 2    1           2    Rural     85                      39       23      High
## 3    1           3    Rural     85                      42       12       Low
## 4    0           4    Rural     93                      39       18      High
## 5    0           5    Rural     82                      31        6      High
## 6    0           6    Rural     80                      27       15       Med
## 7    0           7    Rural     94                      49       38       Low
## 8    1           8    Rural    100                      37       39       Low
## 9    0           9    Urban     93                      51       35       Med
## 10   1          10    Urban     89                      51       17       Med
## 11   1          11    Urban     83                      54        7      High
## 12   1          12    Urban     70                      29       13       Med
## 13   0          13    Urban     93                      50       13      High
## 14   1          14    Urban     85                      36       10       Med
## 15   0          15    Urban     82                      38       39       Low
## 16   0          16    Urban     84                      43       28       Med
 # the descriptive statistics for all the variables by the two arms from the selected scheme
 Design_result$baseline_table
##                                           arm = 0       arm = 1
## n                                               8             8
## location = Urban (%)                    4 (50.0)      4 (50.0) 
## inciis (mean (SD))                   87.62 (6.12)  86.38 (8.75)
## uptodateonimmunizations (mean (SD))  41.00 (8.93)  40.62 (8.23)
## hispanic (mean (SD))                24.00 (12.65) 20.62 (13.80)
## incomecat (%)                                                  
##    High                                 3 (37.5)      2 (25.0) 
##    Low                                  2 (25.0)      3 (37.5) 
##    Med                                  3 (37.5)      3 (37.5)
 # the cluster pair descriptive, which is useful for valid randomization check
 Design_result$cluster_coin_des
## NULL
 # the overall allocation summary
 Design_result$overall_allocations
##   overall allocations checked allocations accepted allocations overall % acceptable
## 1               12870               12870                 1287                  10%

From the output of Design_result$baseline_table, the selected scheme is able to properly balance the baseline values of the covariates. The selected scheme is shown in Design_result$allocation.

cvrall() example for stratified constrained randomization

User-defined weights can be used to induce stratification on one or more categorical variables. In the study presented by (Dickinson et al. 2015), there are 8 "Urban" and 8 "Rural" counties. A user-defined weight of 1,000 is added to the covariate of location, while these weights for other covariates are all 1. Intuitively, a large weight assigned to a covariate sharply penalizes any imbalance of that covariates, therefore including schemes that are optimally balanced with respect to that covariate in the constrained randomization space. In practice, the resulting constrained space approximates the stratified randomization space on that covariate. In our illustrative data example, since half of the counties are located in rural areas, perfect balance is achieved by considering constrained randomization with the large weight for location variable. Alternatively, the option of stratify is able to perform the equivalent stratification on the stratifying variables specified.

# Stratification on location, with constrained randomization on other specified covariates

Design_stratified_result1 <- cvrall(clustername = Dickinson_design$county,
                                     balancemetric = "l2",
                                     x = data.frame(Dickinson_design[ , c("location", "inciis", 
                                                                          "uptodateonimmunizations", 
                                                                          "hispanic", "incomecat")]),
                                     ntotal_cluster = 16,
                                     ntrt_cluster = 8,
                                     categorical = c("location", "incomecat"),
                                     weights = c(1000, 1, 1, 1, 1),
                                     cutoff = 0.1,
                                     seed = 12345) 

Design_stratified_result1$baseline_table
##                                           arm = 0       arm = 1
## n                                               8             8
## location = Urban (%)                    4 (50.0)      4 (50.0) 
## inciis (mean (SD))                   87.62 (6.12)  86.38 (8.75)
## uptodateonimmunizations (mean (SD))  41.00 (8.93)  40.62 (8.23)
## hispanic (mean (SD))                24.00 (12.65) 20.62 (13.80)
## incomecat (%)                                                  
##    High                                 3 (37.5)      2 (25.0) 
##    Low                                  2 (25.0)      3 (37.5) 
##    Med                                  3 (37.5)      3 (37.5)
# An alternative and equivalent way to stratify on location

Design_stratified_result2 <- cvrall(clustername = Dickinson_design$county,
                                     balancemetric = "l2",
                                     x = data.frame(Dickinson_design[ , c("location", "inciis",
                                                                          "uptodateonimmunizations", 
                                                                          "hispanic", "incomecat")]),
                                     ntotal_cluster = 16,
                                     ntrt_cluster = 8,
                                     categorical = c("location", "incomecat"),
                                     stratify = "location",
                                     cutoff = 0.1,
                                     seed = 12345, 
                                     check_validity = TRUE)

Design_stratified_result2$baseline_table
##                                           arm = 0       arm = 1
## n                                               8             8
## location = Urban (%)                    4 (50.0)      4 (50.0) 
## inciis (mean (SD))                   87.62 (6.12)  86.38 (8.75)
## uptodateonimmunizations (mean (SD))  41.00 (8.93)  40.62 (8.23)
## hispanic (mean (SD))                24.00 (12.65) 20.62 (13.80)
## incomecat (%)                                                  
##    High                                 3 (37.5)      2 (25.0) 
##    Low                                  2 (25.0)      3 (37.5) 
##    Med                                  3 (37.5)      3 (37.5)

The results from Design_stratified_result1$baseline_table and Design_stratified_result2$baseline_table are the same. The final selected scheme from cvrall() now has 4 "Urban" counties in both arms. The location covariate has been stratified for the randomization for the randomization through the weights or stratify argument in the cvrall() function.

cvrcov() example for covariate-by-covariate constrained randomization

For the covariate-by-covariate randomization, we used the cvrcov() function to randomize 8 out of the 16 counties into the practice-based. For the definition of the whole randomization space, if the total number of all possible schemes is smaller than 100,000, we enumerate all the schemes as the whole randomization space. Otherwise, we simulate 100,000 unique schemes. Location has "Rural" and "Urban". The level of "Rural" was then kept as 1 in cvrcov() and "Urban" is 0. Then we constrained the randomization space to have the schemes with absolute total difference of location be smaller than or equal to 5, absolute mean difference of percentages of children ages 19-35 months in the CIIS less than or equal to 0.5 fraction of the overall mean, and absolute mean difference of income to be less than or equal to the 0.4 fraction of the overall mean. Finally, a randomization scheme is sampled from the constrained space.

We saved the constrained randomization space in a CSV file in "dickinson_cov_constrained.csv", the first column of which is an indicator variable of the finally selected scheme (1) or not (0).

# change the categorical variable of interest to have numeric representation
Dickinson_design_numeric <- Dickinson_design
Dickinson_design_numeric$location = (Dickinson_design$location == "Rural") * 1

Design_cov_result <- cvrcov(clustername = Dickinson_design_numeric$county,
                            x = data.frame(Dickinson_design_numeric[ , c("location", "inciis", 
                                                                          "uptodateonimmunizations", 
                                                                          "hispanic", "income")]),
                            ntotal_cluster = 16,
                            ntrt_cluster = 8,
                            constraints = c("s5", "mf.5", "any", "any", "mf0.4"), 
                            categorical = c("location"),
                            ###### Option to save the constrained space #####
                            # savedata = "dickinson_cov_constrained.csv",
                            seed = 12345, 
                            check_validity = TRUE)

The we had the following output:

 # the allocation scheme from constrained randomization
 Design_cov_result$allocation
##       id allocation
##  [1,]  1          0
##  [2,]  2          1
##  [3,]  3          0
##  [4,]  4          1
##  [5,]  5          1
##  [6,]  6          1
##  [7,]  7          0
##  [8,]  8          0
##  [9,]  9          1
## [10,] 10          0
## [11,] 11          0
## [12,] 12          0
## [13,] 13          1
## [14,] 14          0
## [15,] 15          1
## [16,] 16          1
 # the statement about how many clusters to be randomized to the intervention and the control arms respectively
 Design_cov_result$assignment_message
## [1] "You have indicated that you want to assign 8 clusters to treatment and 8 to control"
 # the statement about how to get the whole randomization space to use in constrained randomization
 Design_cov_result$scheme_message
## [1] "Enumerating all the 12870 schemes for 8 clusters in the treatment arm out of 16 clusters in total"
 # the data frame containing the allocation scheme, the clustername as well as the original data frame of covariates
 Design_cov_result$data_CR
##    arm id location inciis uptodateonimmunizations hispanic income
## 1    0  1        1     94                      37       44  35988
## 2    1  2        1     85                      39       23  67565
## 3    0  3        1     85                      42       12  35879
## 4    1  4        1     93                      39       18  63617
## 5    1  5        1     82                      31        6  59118
## 6    1  6        1     80                      27       15  57179
## 7    0  7        1     94                      49       38  29738
## 8    0  8        1    100                      37       39  37350
## 9    1  9        0     93                      51       35  52923
## 10   0 10        0     89                      51       17  58302
## 11   0 11        0     83                      54        7  93819
## 12   0 12        0     70                      29       13  54839
## 13   1 13        0     93                      50       13  63857
## 14   0 14        0     85                      36       10  53502
## 15   1 15        0     82                      38       39  39570
## 16   1 16        0     84                      43       28  52457
 # the descriptive statistics for all the variables by the two arms from the selected scheme
 Design_cov_result$baseline_table
##                                                 arm = 0            arm = 1
## n                                                     8                  8
## location = 1 (%)                              4 (50.0)           4 (50.0) 
## inciis (mean (SD))                         87.50 (9.12)       86.50 (5.58)
## uptodateonimmunizations (mean (SD))        41.88 (8.69)       39.75 (8.33)
## hispanic (mean (SD))                      22.50 (15.13)      22.12 (11.32)
## income (mean (SD))                  49927.12 (20670.81) 57035.75 (8847.89)
# the cluster pair descriptive, which is useful for valid randomization check
Design_cov_result$cluster_coin_des
##               Mean Std Dev  Minimum 25th Pctl   Median 75th Pctl  Maximum
## samecount 5937.867  35.142 5892.000  5902.000 5962.000  5972.000 5978.000
## samefrac     0.467   0.003    0.463     0.464    0.469     0.469    0.470
## diffcount 6786.133  35.142 6746.000  6752.000 6762.000  6822.000 6832.000
## difffrac     0.533   0.003    0.530     0.531    0.531     0.536    0.537
# the overall allocation summary
Design_cov_result$overall_allocations
##   overall allocations checked allocations accepted allocations overall % acceptable
## 1               12870               12870                12724               98.87%

From the output of Design_cov_result$baseline_table, the selected scheme is able to properly balance the baseline values of the covariates. The selected scheme is shown in Design_cov_result$allocation.

Analysis: clustered permutation test

At the end of cluster randomized trials, individual outcomes are collected. Permutation test based on (Gail et al. 1996) and (Fan Li et al. 2016) is then applied to the continuous or binary outcome with some individual-level covariates.

The permutation test is implemented in a two-step procedure. In the first step, an outcome regression model is fitted for response \(Y_{ij}\) with covariates \(\textbf{z}_{ij}\). This is done by fitting a linear regression model for continuous responses and a logistic regression model for binary responses (Gail et al. 1996), ignoring the clustering of responses. The individual residual \(r_{ij}=Y_{ij}-\hat{Y}_{ij}\) can be calculated from the predicted response for each individual by \(\hat{Y}_{ij}\). In the second step, cluster-specific residual means are obtained as \(\bar{r}_{i\cdot}=\sum_{j=1}^{m_i}r_{ij}/m_i\). The observed test statistic is then computed as \(U=\frac{1}{n_T}\sum_{i=1}^nW_i\bar{r}_{i\cdot}- \frac{1}{n_C}\sum_{i=1}^n(1-W_i)\bar{r}_{i\cdot}\), where \(W_i=1\) if the \(i\)th cluster is assigned to the treatment arm and \(W_i=0\) otherwise, and \(n_T=\sum_{i=1}^nW_i\), \(n_C=\sum_{i=1}^n(1-W_i)\) are the number of treated and control clusters.

Suppose there are \(S\) randomization schemes in the constrained randomization space. To obtain the permutation distribution of the test statistic, we permute the labels of the treatment indicator according to the constrained randomization space, and compute a value of \(U_s\) (\(s=1,\ldots,S\)). The collection of these values \(\{U_s:s=1,\ldots,S\}\) forms the null distribution of the permutation test statistic. The p-value is then computed by \(\text{p-value}=\frac{1}{S}\sum_{s=1}^S \mathbb{I}(|U_s|\geq |U|)\).

The cptest() function in the cvcrand package is used to perform the permutation test for the intervention effect of cluster randomized trials.

Each categorical variable is transformed into dummy variables to fit in the linear model or logistic regression for the permutation test. When transforming a categorical variable to dummy variables, the reference level will be dropped if the categorical variable is specified as a factor. Otherwise, the first level in the alphanumerical order will be dropped. Users can also specify a certain reference level for each categorical variable by manually coding dummy variables before running the cptest() function.

cptest() example

Suppose that the researchers were able to assess 300 children in each cluster in a study presented by (Dickinson et al. 2015), and the cluster randomized trial is processed with the selected randomization scheme from the example above of the cvrall() function. We expanded the values of the cluster-level covariates on the covariates’ values of the individuals, according to which cluster they belong to. The correlated individual outcome of up-to-date on immunizations (1) or not (0) is then simulated using a generalized linear mixed model (GLMM) with a logistic link to induce correlation by including a random effect at the county level. The intracluster correlation (ICC) was set to be 0.01, using the latent response definition provided in (Eldridge, Ukoumunne, and Carlin 2009). This is a reasonable value for population health studies (Hannan et al. 1994). We simulated one data set, with the outcome data dependent on the county-level covariates used in the constrained randomization design and a positive treatment effect so that the practice-based intervention increases up-to-date immunization rates more than the community-based intervention. For each individual child, the outcome is equal to 1 if he or she is up-to-date on immunizations and 0 otherwise.

county location inciis uptodateonimmunizations hispanic incomecat outcome
1 Rural 94 37 44 0 1
1 Rural 94 37 44 0 1
1 Rural 94 37 44 0 1
1 Rural 94 37 44 0 1
1 Rural 94 37 44 0 0
1 Rural 94 37 44 0 0
1 Rural 94 37 44 0 1
1 Rural 94 37 44 0 1
1 Rural 94 37 44 0 1
1 Rural 94 37 44 0 1

We used the cptest() function to process the clustered permutation test on the binary outcome of the status of up-to-date on immunizations. We input the file about the constrained space with the first column indicating the final scheme. The permutation test is on the continuous covariates of "inciis", "uptodateonimmunizations", "hispanic", as well as categorical variables of "location" and "incomecat". Location has "Rural" and "Urban". The level of "Rural" was then dropped in cptest(). As income category has three levels of "low", "med", and "high", the level of "high" was dropped to create dummy variables according to the alphanumerical order as well.

 Analysis_result <- cptest(outcome = Dickinson_outcome$outcome,
                           clustername = Dickinson_outcome$county,
                           z = data.frame(Dickinson_outcome[ , c("location", "inciis",
                               "uptodateonimmunizations", "hispanic", "incomecat")]), 
                            cspacedatname = system.file("dickinson_constrained.csv", package = "cvcrand"),                                 
                           outcometype = "binary",                                                      
                           categorical = c("location","incomecat"))

The result of "cptest()" includes the final scheme for the cluster randomized trial, the p-value from the permutation test as well as a statement about that p-value.

 Analysis_result 
## $FinalScheme
##    Cluster_ID Intervention
## 1           1            0
## 2           2            0
## 3           3            0
## 4           4            1
## 5           5            1
## 6           6            0
## 7           7            1
## 8           8            0
## 9           9            1
## 10         10            1
## 11         11            0
## 12         12            1
## 13         13            1
## 14         14            0
## 15         15            1
## 16         16            0
## 
## $pvalue
## [1] 0.042
## 
## $pvalue_statement
## [1] "Clustered permutation test p-value = 0.042"

From the p-value of 0.042 in Analysis_result, the probability of up-to-date on immunizations for the practice-based approach (1) is significantly different from that for the population-based approach (0).

Session Information

## R version 4.2.2 (2022-10-31)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Ventura 13.0
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] cvcrand_0.1.1
## 
## loaded via a namespace (and not attached):
##  [1] highr_0.10       bslib_0.4.2      compiler_4.2.2   pillar_1.8.1     jquerylib_0.1.4 
##  [6] class_7.3-20     forcats_1.0.0    tools_4.2.2      digest_0.6.31    jsonlite_1.8.4  
## [11] evaluate_0.20    lifecycle_1.0.3  tibble_3.1.8     lattice_0.20-45  pkgconfig_2.0.3 
## [16] rlang_1.0.6      Matrix_1.5-1     DBI_1.1.3        cli_3.6.0        rstudioapi_0.14 
## [21] yaml_2.3.7       haven_2.5.3      xfun_0.37        fastmap_1.1.1    e1071_1.7-13    
## [26] dplyr_1.1.0      knitr_1.42       hms_1.1.3        generics_0.1.3   sass_0.4.5      
## [31] vctrs_0.5.2      mitools_2.4      tidyselect_1.2.0 grid_4.2.2       glue_1.6.2      
## [36] R6_2.5.1         fansi_1.0.4      survival_3.4-0   rmarkdown_2.20   magrittr_2.0.3  
## [41] MASS_7.3-58.1    htmltools_0.5.4  splines_4.2.2    labelled_2.12.0  tableone_0.13.2 
## [46] utf8_1.2.3       proxy_0.4-27     survey_4.2-1     cachem_1.0.7     zoo_1.8-12

References

Bailey, RA, and CA Rowley. 1987. “Valid Randomization.” Proceedings of the Royal Society of London. A. Mathematical and Physical Sciences 410 (1838): 105–24.
Dickinson, L Miriam, Brenda Beaty, Chet Fox, Wilson Pace, W Perry Dickinson, Caroline Emsermann, and Allison Kempe. 2015. “Pragmatic Cluster Randomized Trials Using Covariate Constrained Randomization: A Method for Practice-Based Research Networks (PBRNs).” The Journal of the American Board of Family Medicine 28 (5): 663–72.
Eldridge, Sandra M, Obioha C Ukoumunne, and John B Carlin. 2009. “The Intra-Cluster Correlation Coefficient in Cluster Randomized Trials: A Review of Definitions.” International Statistical Review 77 (3): 378–94.
Gail, Mitchell H, Steven D Mark, Raymond J Carroll, Sylvan B Green, and David Pee. 1996. “On Design Considerations and Randomization-Based Inference for Community Intervention Trials.” Statistics in Medicine 15 (11): 1069–92.
Greene, Erich J. 2017. “A SAS Macro for Covariate-Constrained Randomization of General Cluster-Randomized and Unstratified Designs.” Journal of Statistical Software 77 (CS1).
Hannan, Peter J, David M Murray, David R Jacobs Jr, and Paul G McGovern. 1994. “Parameters to Aid in the Design and Analysis of Community Trials: Intraclass Correlations from the Minnesota Heart Health Program.” Epidemiology, 88–95.
Li, Fan, Yuliya Lokhnygina, David M Murray, Patrick J Heagerty, and Elizabeth R DeLong. 2016. “An Evaluation of Constrained Randomization for the Design and Analysis of Group-Randomized Trials.” Statistics in Medicine 35 (10): 1565–79.
Li, F, EL Turner, PJ Heagerty, DM Murray, WM Vollmer, and ER DeLong. 2017. “An Evaluation of Constrained Randomization for the Design and Analysis of Group-Randomized Trials with Binary Outcomes.” Statistics in Medicine 36 (24): 3791.
Raab, Gillian M, and Izzy Butcher. 2001. “Balance in Cluster Randomized Trials.” Statistics in Medicine 20 (3): 351–65.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.