The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Introduction to betaregscale

Overview

The betaregscale package provides maximum-likelihood estimation of beta regression models for responses derived from bounded rating scales. Common examples include pain intensity scales (NRS-11, NRS-21, NRS-101), Likert-type scales, product quality ratings, and any instrument whose response can be mapped to the open interval \((0,1)\).

The key idea is that a discrete score recorded on a bounded scale carries measurement uncertainty inherent to the instrument. For instance, a pain score of \(y=6\) on a 0–10 NRS is not an exact value but rather represents a range: after rescaling to \((0,1)\), the observation is treated as interval-censored in \([0.55,0.65]\). The package uses the beta distribution to model such data, building a complete likelihood that supports mixed censoring types within the same dataset.

Installation

# Development version from GitHub:
# install.packages("remotes")
remotes::install_github("evandeilton/betaregscale")
library(betaregscale)

Censoring types

The complete likelihood supports four censoring types, automatically classified by brs_check():

\(\delta\) Type Likelihood contribution
0 Exact (uncensored) \(f(y_i;a_i,b_i)\)
1 Left-censored (\(y=0\)) \(F(u_i;a_i,b_i)\)
2 Right-censored (\(y=K\)) \(1-F(l_i;a_i,b_i)\)
3 Interval-censored \(F(u_i;a_i,b_i)-F(l_i;a_i,b_i)\)

where \(f(\cdot)\) and \(F(\cdot)\) are the beta density and CDF, \([l_i,u_i]\) are the interval endpoints, and \((a_i,b_i)\) are the beta shape parameters derived from \(\mu_i\) and \(\phi_i\) via the chosen reparameterization.

Interval construction

Scale observations are mapped to \((0,1)\) with midpoint uncertainty intervals:

\[y_t=y/K,\quad\text{interval }[y_t-h/K,y_t+h/K]\]

where \(K\) is the number of scale categories (ncuts) and \(h\) is the half-width (lim, default 0.5).

# Illustrate brs_check with a 0-10 NRS scale
y_example <- c(0, 3, 5, 7, 10)
cr <- brs_check(y_example, ncuts = 10)
kbl10(cr)
left right yt y delta
0.00 0.05 0.0 0 1
0.25 0.35 0.3 3 3
0.45 0.55 0.5 5 3
0.65 0.75 0.7 7 3
0.95 1.00 1.0 10 2

The delta column shows that \(y=0\) is left-censored (\(\delta=1\)), \(y=10\) is right-censored (\(\delta=2\)), and all interior values are interval-censored (\(\delta=3\)).

Data preparation with brs_prep()

In practice, analysts may want to supply their own censoring indicators or interval endpoints rather than relying on the automatic classification of brs_check(). The brs_prep() function provides a flexible, validated bridge between raw analyst data and brs().

It supports four input modes:

Mode 1: Score only (automatic)

# Equivalent to brs_check - delta inferred from y
d1 <- data.frame(y = c(0, 3, 5, 7, 10), x1 = rnorm(5))
kbl10(brs_prep(d1, ncuts = 10))
left right yt y delta x1
0.00 0.05 0.0 0 1 0.5769
0.25 0.35 0.3 3 3 1.1427
0.45 0.55 0.5 5 3 -0.7074
0.65 0.75 0.7 7 3 -0.6186
0.95 1.00 1.0 10 2 -0.0083

Mode 2: Score + explicit censoring indicator

# Analyst specifies delta directly
d2 <- data.frame(
  y     = c(50, 0, 99, 50),
  delta = c(0, 1, 2, 3),
  x1    = rnorm(4)
)
kbl10(brs_prep(d2, ncuts = 100))
left right yt y delta x1
0.500 0.500 0.50 50 0 0.3389
0.000 0.005 0.00 0 1 1.4054
0.985 1.000 0.99 99 2 -0.8653
0.495 0.505 0.50 50 3 -0.8740

Mode 3: Interval endpoints with NA patterns

When the analyst provides left and/or right columns, censoring is inferred from the NA pattern:

d3 <- data.frame(
  left  = c(NA, 20, 30, NA),
  right = c(5, NA, 45, NA),
  y     = c(NA, NA, NA, 50),
  x1    = rnorm(4)
)
kbl10(brs_prep(d3, ncuts = 100))
left right yt y delta x1
0.0 0.05 0.025 NA 1 0.7743
0.2 1.00 0.600 NA 2 -0.4019
0.3 0.45 0.375 NA 3 -0.2151
0.5 0.50 0.500 50 0 0.6053

Mode 4: Analyst-supplied intervals

When the analyst provides y, left, and right simultaneously, their endpoints are used directly (rescaled by \(K\)):

d4 <- data.frame(
  y     = c(50, 75),
  left  = c(48, 73),
  right = c(52, 77),
  x1    = rnorm(2)
)
kbl10(brs_prep(d4, ncuts = 100))
left right yt y delta x1
0.48 0.52 0.50 50 3 -0.6146
0.73 0.77 0.75 75 3 -0.7244

Using brs_prep with brs()

Data processed by brs_prep() is automatically detected by brs() - the internal brs_check() step is skipped:

set.seed(42)
n <- 1000
dat <- data.frame(x1 = rnorm(n), x2 = rnorm(n))
sim <- brs_sim(
  formula = ~ x1 + x2, data = dat,
  beta = c(0.2, -0.5, 0.3), phi = 0.3,
  link = "logit", link_phi = "logit",
  repar = 2
)
# Remove left, right, yt so brs_prep can rebuild them
prep <- brs_prep(sim[-c(1:3)], ncuts = 100)
fit_prep <- brs(y ~ x1 + x2,
  data = prep, repar = 2,
  link = "logit", link_phi = "logit"
)
summary(fit_prep, digits = 4)
#> 
#> Call:
#> brs(formula = y ~ x1 + x2, data = prep, link = "logit", link_phi = "logit", 
#>     repar = 2)
#> 
#> Quantile residuals:
#>     Min      1Q  Median      3Q     Max 
#> -3.1122 -0.6590 -0.0288  0.6409  3.6970 
#> 
#> Coefficients (mean model with logit link):
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  0.18254    0.04364   4.183 2.88e-05 ***
#> x1          -0.48490    0.04491 -10.797  < 2e-16 ***
#> x2           0.26206    0.04447   5.893 3.80e-09 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Phi coefficients (precision model with logit link):
#>       Estimate Std. Error z value Pr(>|z|)    
#> (phi)  0.26989    0.03992    6.76 1.38e-11 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> ---
#> Log-likelihood: -4072.5673 on 4 Df | AIC: 8153.1346 | BIC: 8172.7656 
#> Pseudo R-squared: 0.1292 
#> Number of iterations: 35 (BFGS) 
#> Censoring: 796 interval | 74 left | 130 right

Example 1: Fixed dispersion model

Simulating data

We simulate observations from a beta regression model with fixed dispersion, two covariates, and a logit link for the mean.

set.seed(4255)
n <- 1000
dat <- data.frame(x1 = rnorm(n), x2 = rnorm(n))

sim_fixed <- brs_sim(
  formula  = ~ x1 + x2,
  data     = dat,
  beta     = c(0.3, -0.6, 0.4),
  phi      = 1 / 10,
  link     = "logit",
  link_phi = "logit",
  ncuts    = 100,
  repar    = 2
)

kbl10(head(sim_fixed, 8))
left right yt y delta x1 x2
0.165 0.175 0.17 17 3 1.9510 0.2888
0.985 0.995 0.99 99 3 0.7725 1.6103
0.000 0.005 0.00 0 1 0.7264 1.0542
0.575 0.585 0.58 58 3 0.0487 -0.4923
0.095 0.105 0.10 10 3 -0.5445 -1.4098
0.065 0.075 0.07 7 3 0.3600 -1.0116
0.025 0.035 0.03 3 3 0.7136 2.3083
0.355 0.365 0.36 36 3 -0.7274 0.2061

Each observation is centered in its interval. For example, a score of 67 on a 0–100 scale yields \(y_t=0.67\) with the interval \([0.665,0.675]\).

Fitting the model

fit_fixed <- brs(
  y ~ x1 + x2,
  data     = sim_fixed,
  link     = "logit",
  link_phi = "logit",
  repar    = 2
)
summary(fit_fixed)
#> 
#> Call:
#> brs(formula = y ~ x1 + x2, data = sim_fixed, link = "logit", 
#>     link_phi = "logit", repar = 2)
#> 
#> Quantile residuals:
#>     Min      1Q  Median      3Q     Max 
#> -3.0753 -0.6600  0.0058  0.7071  3.3708 
#> 
#> Coefficients (mean model with logit link):
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  0.30604    0.04312   7.097 1.28e-12 ***
#> x1          -0.60332    0.04530 -13.317  < 2e-16 ***
#> x2           0.44135    0.04385  10.066  < 2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Phi coefficients (precision model with logit link):
#>       Estimate Std. Error z value Pr(>|z|)   
#> (phi)  0.10991    0.04057   2.709  0.00674 **
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> ---
#> Log-likelihood: -4035.2262 on 4 Df | AIC: 8078.4524 | BIC: 8098.0834 
#> Pseudo R-squared: 0.2393 
#> Number of iterations: 39 (BFGS) 
#> Censoring: 809 interval | 65 left | 126 right

The summary output follows the betareg package style, showing separate coefficient tables for the mean and precision submodels, with Wald z-tests and \(p\)-values based on the standard normal distribution.

Goodness of fit

kbl10(brs_gof(fit_fixed))
logLik AIC BIC pseudo_r2
-4035.226 8078.452 8098.083 0.2393

Residual diagnostics

The plot() method provides six diagnostic panels. By default, the first four are shown:

plot(fit_fixed, caption = NULL, gg = TRUE, title = NULL, theme = ggplot2::theme_bw())

For ggplot2 output (requires the ggplot2 package):

plot(fit_fixed, gg = TRUE)

Predictions

# Fitted means
kbl10(
  data.frame(mu_hat = head(predict(fit_fixed, type = "response"))),
  digits = 4
)
mu_hat
0.3222
0.6343
0.5825
0.5148
0.5031
0.4115

# Conditional variance
kbl10(
  data.frame(var_hat = head(predict(fit_fixed, type = "variance"))),
  digits = 4
)
var_hat
0.1152
0.1224
0.1283
0.1317
0.1319
0.1277

# Quantile predictions
kbl10(predict(fit_fixed, type = "quantile", at = c(0.10, 0.25, 0.5, 0.75, 0.90)))
q_0.1 q_0.25 q_0.5 q_0.75 q_0.9
0.0007 0.0170 0.1779 0.6024 0.8964
0.0711 0.3148 0.7508 0.9675 0.9980
0.0435 0.2311 0.6578 0.9398 0.9947
0.0208 0.1444 0.5288 0.8860 0.9856
0.0181 0.1318 0.5060 0.8745 0.9833
0.0048 0.0565 0.3311 0.7601 0.9538
0.1348 0.4660 0.8689 0.9905 0.9997
0.1221 0.4392 0.8517 0.9880 0.9996
0.0319 0.1895 0.6011 0.9185 0.9915
0.0288 0.1777 0.5834 0.9111 0.9902

Confidence intervals

Wald confidence intervals based on the asymptotic normal approximation:

kbl10(confint(fit_fixed))
2.5 % 97.5 %
(Intercept) 0.2215 0.3906
x1 -0.6921 -0.5145
x2 0.3554 0.5273
(phi) 0.0304 0.1894
kbl10(confint(fit_fixed, model = "mean"))
2.5 % 97.5 %
(Intercept) 0.2215 0.3906
x1 -0.6921 -0.5145
x2 0.3554 0.5273

Censoring structure

The brs_cens() function provides a visual and tabular overview of the censoring types in the fitted model:

brs_cens(fit_fixed, gg = TRUE, inform = TRUE)

Example 2: Variable dispersion model

In many applications, the dispersion parameter \(\phi\) may depend on covariates. The package supports variable-dispersion models using the Formula package notation: y ~ x1 + x2 | z1 + z2, where the terms after | define the linear predictor for \(\phi\). The same brs_sim() function is used for fixed and variable dispersion; the second formula part activates the precision submodel in simulation.

Simulating data

set.seed(2222)
n <- 1000
dat_z <- data.frame(
  x1 = rnorm(n),
  x2 = rnorm(n),
  x3 = rbinom(n, size = 1, prob = 0.5),
  z1 = rnorm(n),
  z2 = rnorm(n)
)

sim_var <- brs_sim(
  formula = ~ x1 + x2 + x3 | z1 + z2,
  data = dat_z,
  beta = c(0.2, -0.6, 0.2, 0.2),
  zeta = c(0.2, -0.8, 0.6),
  link = "logit",
  link_phi = "logit",
  ncuts = 100,
  repar = 2
)

kbl10(head(sim_var, 10))
left right yt y delta x1 x2 x3 z1 z2
0.265 0.275 0.27 27 3 -0.3381 -0.8235 0 0.0306 -1.1222
0.175 0.185 0.18 18 3 0.9392 -1.7563 0 0.1938 -0.2891
0.000 0.005 0.00 0 1 1.7377 -1.3148 1 -0.4283 0.3479
0.525 0.535 0.53 53 3 0.6963 -0.8196 0 0.3694 0.2811
0.085 0.095 0.09 9 3 0.4623 -0.1183 1 0.0553 -1.2184
0.955 0.965 0.96 96 3 -0.3151 -0.0648 1 0.7169 -0.5613
0.165 0.175 0.17 17 3 0.1927 -0.9985 0 -0.2222 0.4225
0.385 0.395 0.39 39 3 1.1307 0.6330 1 1.2509 -1.5492
0.005 0.015 0.01 1 3 1.9764 -0.4887 0 -1.8699 0.3679
0.515 0.525 0.52 52 3 1.2071 -1.3548 1 -0.2813 -1.6754

Fitting the model

fit_var <- brs(
  y ~ x1 + x2 | z1,
  data     = sim_var,
  link     = "logit",
  link_phi = "logit",
  repar    = 2
)
summary(fit_var)
#> 
#> Call:
#> brs(formula = y ~ x1 + x2 | z1, data = sim_var, link = "logit", 
#>     link_phi = "logit", repar = 2)
#> 
#> Quantile residuals:
#>     Min      1Q  Median      3Q     Max 
#> -3.6111 -0.6344  0.0131  0.6654  3.2463 
#> 
#> Coefficients (mean model with logit link):
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  0.27141    0.04245   6.394 1.62e-10 ***
#> x1          -0.54553    0.04375 -12.468  < 2e-16 ***
#> x2           0.17226    0.04141   4.160 3.18e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Phi coefficients (precision model with logit link):
#>                   Estimate Std. Error z value Pr(>|z|)    
#> (phi)_(Intercept)  0.24653    0.04233   5.823 5.77e-09 ***
#> (phi)_z1          -0.72230    0.04507 -16.028  < 2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> ---
#> Log-likelihood: -3922.2430 on 5 Df | AIC: 7854.4861 | BIC: 7879.0249 
#> Pseudo R-squared: 0.1159 
#> Number of iterations: 42 (BFGS) 
#> Censoring: 744 interval | 105 left | 151 right

Notice the (phi)_ prefix in the precision coefficient names, following the betareg convention.

Accessing coefficients by submodel

# Full parameter vector
round(coef(fit_var), 4)
#>       (Intercept)                x1                x2 (phi)_(Intercept) 
#>            0.2714           -0.5455            0.1723            0.2465 
#>          (phi)_z1 
#>           -0.7223

# Mean submodel only
round(coef(fit_var, model = "mean"), 4)
#> (Intercept)          x1          x2 
#>      0.2714     -0.5455      0.1723

# Precision submodel only
round(coef(fit_var, model = "precision"), 4)
#> (phi)_(Intercept)          (phi)_z1 
#>            0.2465           -0.7223

# Variance-covariance matrix for the mean submodel
kbl10(vcov(fit_var, model = "mean"))
(Intercept) x1 x2
(Intercept) 0.0018 -0.0001 0.0000
x1 -0.0001 0.0019 0.0000
x2 0.0000 0.0000 0.0017

Diagnostics for variable dispersion

plot(fit_var)

Advanced analyst functions

The package includes analyst-facing helpers for uncertainty quantification, effect interpretation, score-scale communication, and predictive validation.

Parametric bootstrap confidence intervals

set.seed(101)
boot_ci <- brs_bootstrap(
  fit_fixed,
  R = 100,
  level = 0.95,
  ci_type = "bca",
  keep_draws = TRUE
)
kbl10(head(boot_ci, 10))
parameter estimate se_boot ci_lower ci_upper mcse_lower mcse_upper wald_lower wald_upper level
(Intercept) 0.3060 0.0410 0.2420 0.4106 0.0058 0.0137 0.2215 0.3906 0.95
x1 -0.6033 0.0424 -0.6863 -0.5387 0.0059 0.0045 -0.6921 -0.5145 0.95
x2 0.4413 0.0423 0.3641 0.5088 0.0100 0.0034 0.3554 0.5273 0.95
(phi) 0.1099 0.0396 0.0307 0.1714 0.0093 0.0039 0.0304 0.1894 0.95
autoplot.brs_bootstrap(
  boot_ci,
  type = "ci_forest",
  title = "Bootstrap (BCa) vs Wald intervals"
)

Average marginal effects (AME)

set.seed(202)
ame <- brs_marginaleffects(
  fit_fixed,
  model = "mean",
  type = "response",
  interval = TRUE,
  n_sim = 120,
  keep_draws = TRUE
)
kbl10(ame)
variable ame std.error ci.lower ci.upper model type n
x1 -0.1321 0.0088 -0.1473 -0.1123 mean response 1000
x2 0.0966 0.0093 0.0798 0.1140 mean response 1000
autoplot.brs_marginaleffects(ame, type = "forest")

Score probabilities on the original scale

prob_scores <- brs_predict_scoreprob(fit_fixed, scores = 0:10)
kbl10(prob_scores[1:6, 1:6])
score_0 score_1 score_2 score_3 score_4 score_5
0.1754 0.0657 0.0386 0.0288 0.0235 0.0201
0.0218 0.0190 0.0139 0.0117 0.0104 0.0095
0.0320 0.0249 0.0176 0.0145 0.0127 0.0115
0.0516 0.0342 0.0230 0.0185 0.0159 0.0142
0.0559 0.0360 0.0240 0.0192 0.0165 0.0146
0.1016 0.0509 0.0318 0.0246 0.0206 0.0180
kbl10(
  data.frame(row_sum = rowSums(prob_scores)[1:6]),
  digits = 4
)
row_sum
0.4263
0.1260
0.1605
0.2141
0.2245
0.3163

Repeated k-fold cross-validation

set.seed(303) # For cross-validation reproducibility
cv_res <- brs_cv(
  y ~ x1 + x2,
  data = sim_fixed,
  k = 5,
  repeats = 5,
  repar = 2,
)
kbl10(cv_res)
repeat fold n_train n_test log_score rmse_yt mae_yt converged error
1 1 800 200 -4.0713 0.3472 0.3019 TRUE NA
1 2 800 200 -3.9984 0.3451 0.3019 TRUE NA
1 3 800 200 -3.9235 0.3395 0.2997 TRUE NA
1 4 800 200 -4.0902 0.3311 0.2874 TRUE NA
1 5 800 200 -4.1125 0.3420 0.2958 TRUE NA
2 1 800 200 -4.0795 0.3270 0.2892 TRUE NA
2 2 800 200 -4.2062 0.3504 0.2972 TRUE NA
2 3 800 200 -4.1858 0.3248 0.2818 TRUE NA
2 4 800 200 -3.8729 0.3457 0.3036 TRUE NA
2 5 800 200 -3.8752 0.3578 0.3159 TRUE NA
kbl10(
  data.frame(
    metric = c("log_score", "rmse_yt", "mae_yt"),
    mean = c(
      mean(cv_res$log_score, na.rm = TRUE),
      mean(cv_res$rmse_yt, na.rm = TRUE),
      mean(cv_res$mae_yt, na.rm = TRUE)
    )
  ),
  digits = 4
)
metric mean
log_score -4.0404
rmse_yt 0.3409
mae_yt 0.2974

S3 methods reference

The following standard S3 methods are available for objects of class "brs":

Method Description
print() Compact display of call and coefficients
summary() Detailed output with Wald tests and goodness-of-fit
coef(model=) Extract coefficients (full, mean, or precision)
vcov(model=) Variance-covariance matrix (full, mean, or precision)
confint(model=) Wald confidence intervals
logLik() Log-likelihood value
AIC(), BIC() Information criteria
nobs() Number of observations
formula() Model formula
model.matrix(model=) Design matrix (mean or precision)
fitted() Fitted mean values
residuals(type=) Residuals: response, pearson, rqr, weighted, sweighted
predict(type=) Predictions: response, link, precision, variance, quantile
plot(gg=) Diagnostic plots (base R or ggplot2)

Reparameterizations

The package supports three reparameterizations of the beta distribution, controlled by the repar argument:

Direct (repar = 0): Shape parameters \(a=\mu\) and \(b=\phi\) are used directly. This is rarely used in practice.

Precision (repar = 1, Ferrari & Cribari-Neto, 2004): The mean \(\mu\in(0,1)\) and precision \(\phi>0\) yield \(a=\mu\phi\) and \(b=(1-\mu)\phi\). Higher \(\phi\) means less variability.

Mean–variance (repar = 2): The mean \(\mu\in(0,1)\) and dispersion \(\phi\in(0,1)\) yield \(a=\mu(1-\phi)/\phi\) and \(b=(1-\mu)(1-\phi)/\phi\). Here \(\phi\) acts as a coefficient of variation: smaller \(\phi\) means less variability.

# Precision parameterization: mu = 0.5, phi = 10 (high precision)
brs_repar(mu = 0.5, phi = 10, repar = 1)
#>   shape1 shape2
#> 1      5      5

# Mean-variance parameterization: mu = 0.5, phi = 0.1 (low dispersion)
brs_repar(mu = 0.5, phi = 0.1, repar = 2)
#>   shape1 shape2
#> 1    4.5    4.5

References

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.