The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
Both the number and size of clusters affect power calculations for CRTs:
The CRTspat
package provides functions for analysing
this trade-off for any site for which baseline data are available. The
example shown here uses the baseline prevalence data introduced in Use Case 1. The trial is assumed to plan to be
based on the same outcome of prevalence, and to be powered for an
efficacy of 30%. A set of different algorithmic cluster allocations are
carried out with different numbers of clusters. Each allocation is
randomized and buffer zones are specified with the a pre-specified width
(in this example, 0.5 km). The ICC is computed from the baseline data,
excluding the buffer zones, and corresponding power calculations are
carried out. The power is calculated and plotted as a function of
cluster size.
# use the same dataset as for Use Case 1.
library(CRTspat)
example_locations <- readdata('example_site.csv')
example_locations$base_denom <- 1
exampleCRT <- CRTsp(example_locations)
example <- aggregateCRT(exampleCRT,
auxiliaries = c("RDT_test_result", "base_denom"))
# randomly sample an array of numbers of clusters to allocate
set.seed(5)
c_vec <- round(runif(20, min = 6, max = 60))
CRTscenario <- function(c, CRT, buffer_width) {
ex <- specify_clusters(CRT, c = c, algo = "kmeans") %>%
randomizeCRT() %>%
specify_buffer(buffer_width = buffer_width)
GEEanalysis <- CRTanalysis(ex, method = "GEE", baselineOnly = TRUE, excludeBuffer = TRUE,
baselineNumerator = "RDT_test_result", baselineDenominator = "base_denom")
locations <- GEEanalysis$description$locations
ex_power <- CRTpower(trial = ex, effect = 0.3, yC = GEEanalysis$pt_ests$controlY,
outcome_type = "p", N = GEEanalysis$description$sum.denominators/locations, c = c,
ICC = GEEanalysis$pt_ests$ICC)
value <- c(c_full = c, c_core = ex_power$geom_core$c, clustersRequired = ex_power$geom_full$clustersRequired,
power = ex_power$geom_full$power, mean_h = ex_power$geom_full$mean_h,
locations = locations, ICC = GEEanalysis$pt_ests$ICC)
names(value) <- c("c_full", "c_core", "clustersRequired", "power", "mean_h", "locations", "ICC")
return(value)
}
results <- t(sapply(c_vec, FUN = CRTscenario, simplify = "array", CRT = example,
buffer_width = 0.5)) %>%
data.frame()
Each simulated cluster allocation is different, as are the randomizations. This leads to variation in the locations of the buffer zones, so the number of core clusters is a stochastic function of the number of clusters randomised (c). There is also variation in the estimated Intracluster Correlation (see Use Case 3) for any value of c.
total_locations <- example$geom_full$locations
results$proportion_included <- results$c_core * results$mean_h * 2/total_locations
results$corelocations_required <- results$clustersRequired * results$mean_h
results$totallocations_required <- with(results, total_locations/locations *
corelocations_required)
library(ggplot2)
theme_set(theme_bw(base_size = 14))
ggplot(data = results, aes(x = c_full, y = c_core)) + geom_smooth() + xlab("Clusters allocated (per arm)") +
ylab("Clusters in core (per arm)") + geom_segment(aes(x = 5, xend = 35,
y = 18.5, yend = 18.5), arrow = arrow(length = unit(1, "cm")), lwd = 2,
color = "red")
Fig 4.1 Numbers of clusters
The number of clusters in the core area increases with the number of clusters allocated, until the cluster size becomes small enough for entire clusters to be swallowed by the buffer zones. This can be illustrated by the contrast in the core areas randomised with c = 6 and c = 40 (Figures 4.2 and 4.3).
set.seed(7)
library(dplyr)
example6 <- specify_clusters(example, c = 6, algo = "kmeans") %>%
randomizeCRT() %>%
specify_buffer(buffer_width = 0.5)
plotCRT(example6, map = TRUE, showClusterBoundaries = TRUE, showClusterLabels = TRUE,
labelsize = 2, maskbuffer = 0.2)
example40 <- specify_clusters(example, c = 40, algo = "kmeans") %>%
randomizeCRT() %>%
specify_buffer(buffer_width = 0.5)
plotCRT(example40, map = TRUE, showClusterBoundaries = TRUE, showClusterLabels = TRUE,
labelsize = 2, maskbuffer = 0.2)
Fig 4.2 Map of clusters with c =
6
Fig 4.3 Map of clusters with c =
40
Beyond this point, increasing the number of clusters allocated in the fixed area (by making them smaller) does not add to the total number of clusters. In this example the maximum is achieved when the input c is about 35 and the output c is 18.5.
ggplot(data = results, aes(x = c_core, y = mean_h)) + geom_smooth() + xlab("Clusters in core (per arm)") +
ylab("Mean cluster size")
Fig 4.4 Size of clusters
The size of clusters decreases with the number allocated (Figure 4.4), but does not fall much below 10 locations on average in the example because smaller clusters are likely to be absorbed into the buffer zones.
ggplot(data = results, aes(x = c_core, y = power)) + geom_smooth() + xlab("Clusters in core (per arm)") +
ylab("Power")
Fig 4.5 Power achievable with
given site
The power increases approximately linearly with the number of clusters in the core (Figure 4.5), but the site is too small for an adequate power to be achieved with this size of buffer, irrespective of the cluster size. Because the buffering leads to a maximum in the cluster density (number of clusters per unit area), so does the power achievable with a fixed area (Figure 4.6).
ggplot2::ggplot(data = results, aes(x = c_full, y = power)) + geom_smooth() +
xlab("Clusters allocated (per arm)") + ylab("Power")
Fig 4.6 Power achievable with
given site
However the analysis also gives an estimate of how large an extended site is needed to achieve adequate power (assuming the the spatial pattern for the wider site to be similar to that of the baseline area). A minimum total number of locations required to achieve a pre-specified power (80%) is achieved at the same density of clusters as the maximum of the power estimated for the smaller, baseline site.
ggplot2::ggplot(data = results, aes(x = c_core, y = corelocations_required)) +
geom_smooth() + xlab("Clusters in core (per arm)") + ylab("Required core locations")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Fig 4.7 Number of clusters
required for full trial area
This is also at the allocation density where saturation is achieved in the number of core clusters (Figure 4.1), and where the proportion of the locations included in the core area reaches its minimum (Figure 4.8).
ggplot2::ggplot(data = results, aes(x = c_core, y = proportion_included)) +
geom_smooth() + xlab("Clusters in core (per arm)") + ylab("Proportion of locations in core") +
geom_segment(aes(x = 18, xend = 18, y = 0, yend = 0.25), arrow = arrow(length = unit(1,
"cm")), lwd = 2, color = "red")
Fig 4.8 Proportions of locations
in core
With the example geography and the selected trial outcome, the most efficient trial design, conditional on a buffer width of 0.5 km, would be achieved by assigning about 30 clusters to each arm in a site of the size analysed, though about one third of these clusters would eliminated by inclusion in the buffer zones, so that there would be . This would be far from adequate to achieve adequate power. To achieve 80% power about 8,000 locations would be needed, in a larger trial area, of which about 2,400 would be in the core (sampled) parts of the clusters.
ggplot2::ggplot(data = results, aes(x = c_core, y = totallocations_required)) +
geom_smooth() + xlab("Clusters in core (per arm)") + ylab("Total locations required") +
geom_segment(aes(x = 18, xend = 18, y = 0, yend = 8000), arrow = arrow(length = unit(1,
"cm")), lwd = 2, color = "red")
Fig 4.9 Size of trial area
required to achieve adequate power
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.