The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
A power estimate is only as trustworthy as the test it is built on. If a design/analysis combination does not control its Type I error rate, a high “power” number is meaningless — the test is just rejecting too often, under the null as well as the alternative. mixpower makes this check a first-class step.
mp_calibrate() runs the scenario under the null (the
focal effect set to zero) and estimates the empirical Type I error rate,
with an exact interval and a verdict. For a correctly specified model it
should sit near alpha.
d <- mp_design(clusters = list(subject = 24), trials_per_cell = 6)
a <- mp_assumptions(
fixed_effects = list(`(Intercept)` = 0, condition = 0.4),
random_effects = list(subject = list(intercept_sd = 0.5)),
residual_sd = 1
)
scn <- mp_scenario_lme4(y ~ condition + (1 | subject), design = d, assumptions = a)
mp_calibrate(scn, nsim = 60, seed = 11)
#> <mp_calibration>
#> term: condition (true effect set to 0)
#> nominal alpha: 0.05
#> empirical Type I: 0.0500 (95% CI 0.0104, 0.1392)
#> verdict: well-calibratedThe classic trap: the data have a by-subject random slope, but the
analysis model omits it. This inflates Type I error (Barr et al., 2013).
mp_calibrate() flags it.
a_slope <- mp_assumptions(
fixed_effects = list(`(Intercept)` = 0, condition = 0.4),
random_effects = list(subject = list(
intercept_sd = 0.5, slopes = list(condition = 0.8)
)),
residual_sd = 1
)
# Data have the slope; the fitted model (1 | subject) ignores it.
scn_mis <- mp_scenario_lme4(y ~ condition + (1 | subject), design = d, assumptions = a_slope)
mp_calibrate(scn_mis, nsim = 60, seed = 7)
#> <mp_calibration>
#> term: condition (true effect set to 0)
#> nominal alpha: 0.05
#> empirical Type I: 0.1667 (95% CI 0.0829, 0.2852)
#> verdict: anti-conservative
#> -> This test rejects too often under the null; its power is not
#> trustworthy. Try a df-corrected or bootstrap test, or a model
#> that matches the data-generating random-effects structure.The verdict turns to anti-conservative: any power
computed from this model would be inflated. The fix is to fit the
maximal model (1 + condition | subject), which mixpower
simulates and tests directly.
Wald (z/t) tests are anti-conservative when the number of clusters is
small, because their degrees of freedom are overstated (Luke, 2017).
mp_recommend_method() gives fast, design-based
guidance.
scn_few <- mp_scenario_lme4(
y ~ condition + (1 | subject),
design = mp_design(list(subject = 12), trials_per_cell = 8),
assumptions = a
)
mp_recommend_method(scn_few)
#> <mp_recommendation>
#> current method: wald
#> smallest grouping factor: 12 levels
#> caution: yes
#> recommended: kenward-roger, satterthwaite, pb
#> Smallest grouping factor has 12 levels (< 30). 'wald' tests can be anti-conservative with few clusters; prefer 'kenward-roger' or 'satterthwaite' or 'pb'. Verify with mp_calibrate().With few clusters it steers you toward a degrees-of-freedom-corrected
test (Satterthwaite or Kenward-Roger for LMMs) or a parametric
bootstrap. You can then measure the improvement with
mp_calibrate():
Calibration results drop into the same reporting layer as power results:
mp_report_table(mp_calibrate(scn, nsim = 40, seed = 1))
#> term alpha type1 ci_low ci_high verdict nsim
#> 1 condition 0.05 0.075 0.01574218 0.2038647 well-calibrated 40Run mp_calibrate() whenever you change the design size
or the analysis model. A trustworthy power analysis reports both the
power and the calibration that backs it.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.