The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
The ebrahim.gof package implements the Ebrahim-Farrington goodness-of-fit test for logistic regression models. This test is particularly effective for binary data and sparse datasets, providing an improved alternative to the traditional Hosmer-Lemeshow test.
Copy and paste this in R or R-studio.
# Install devtools if you haven't already
if (!requireNamespace("devtools", quietly = TRUE)) {
install.packages("devtools")
}
# Install ebrahim.gof from GitHub
::install_github("ebrahimkhaled/ebrahim.gof") devtools
Another way to install the R-Libarary, but its not availabe yet.
# Will be available after CRAN submission
install.packages("ebrahim.gof")
library(ebrahim.gof)
# Example with binary data
set.seed(123)
<- 500
n <- rnorm(n)
x <- 0.5 + 1.2 * x
linpred <- 1 / (1 + exp(-linpred))
prob <- rbinom(n, 1, prob)
y
# Fit logistic regression
<- glm(y ~ x, family = binomial())
model <- fitted(model)
predicted_probs
# Perform Ebrahim-Farrington test
<- ef.gof(y, predicted_probs, G = 10)
result print(result)
ef.gof()
The main function that performs the goodness-of-fit test:
ef.gof(y, predicted_probs , G = 10, model = NULL, m = NULL)
Parameters: - y
: Binary response vector
(0/1) or success counts for grouped data - predicted_probs
:
Vector of predicted probabilities from logistic model - G
:
Number of groups for binary data (default: 10) - model
:
Optional glm object (required for original Farrington only, not for
Ebrahim-Farrington test) - m
: Optional vector of trial
counts (for grouped data) (required for original Farrington only, not
for Ebrahim-Farrington test)
Returns: A data frame with test name, test statistic, and p-value.
library(ebrahim.gof)
# Simulate binary data
set.seed(42)
<- 1000
n <- rnorm(n)
x1 <- rnorm(n)
x2 <- -0.5 + 0.8 * x1 + 0.6 * x2
linpred <- plogis(linpred)
prob <- rbinom(n, 1, prob)
y
# Fit logistic regression
<- glm(y ~ x1 + x2, family = binomial())
model <- fitted(model)
predicted_probs
# Test goodness of fit
<- ef.gof(y, predicted_probs, G = 10)
result print(result)
#> Test Test_Statistic p_value
#> 1 Ebrahim-Farrington -0.8944 0.8143
# Test with different numbers of groups
<- data.frame(
results Groups = c(4, 10, 20),
P_value = c(
ef.gof(y, predicted_probs, G = 4)$p_value,
ef.gof(y, predicted_probs, G = 10)$p_value,
ef.gof(y, predicted_probs, G = 20)$p_value
)
)print(results)
library(ResourceSelection)
# Ebrahim-Farrington test
<- ef.gof(y, predicted_probs, G = 10)
ef_result
# Hosmer-Lemeshow test
<- hoslem.test(y, predicted_probs, g = 10)
hl_result
# Compare results
<- data.frame(
comparison Test = c("Ebrahim-Farrington", "Hosmer-Lemeshow"),
P_value = c(ef_result$p_value, hl_result$p.value)
)print(comparison)
# Function to simulate misspecified model
<- function(n, beta_quad = 0.1, n_sims = 100) {
simulate_power <- 0
rejections
for (i in 1:n_sims) {
<- runif(n, -2, 2)
x # True model has quadratic term
<- 0 + x + beta_quad * x^2
linpred_true <- plogis(linpred_true)
prob_true <- rbinom(n, 1, prob_true)
y
# Fit misspecified linear model
<- glm(y ~ x, family = binomial())
model_mis <- fitted(model_mis)
pred_probs
# Test goodness of fit
<- ef.gof(y, pred_probs, G = 10)
test_result
if (test_result$p_value < 0.05) {
<- rejections + 1
rejections
}
}
return(rejections / n_sims)
}
# Calculate power for different sample sizes
<- data.frame(
power_results n = c(100, 200, 500, 1000),
power = sapply(c(100, 200, 500, 1000), simulate_power)
)print(power_results)
The Ebrahim-Farrington test is based on Farrington’s (1996) theoretical framework but simplified for practical implementation with binary data. The test uses a modified Pearson chi-square statistic:
For binary data with automatic grouping, the test statistic is:
Z_EF = (T_EF - (G - 2)) / sqrt(2(G - 2))
Where: - T_EF
is the modified Pearson chi-square
statistic - G
is the number of groups - The test statistic
follows a standard normal distribution under H₀
The following two figures illustrate that, under the null hypothesis, the Ebrahim-Farrington test statistic is asymptotically standard normal for both single-predictor and multiple-predictor logistic regression models. This property holds even in sparse data settings, confirming the theoretical foundation of the test and supporting its use for model assessment. (see (Ebrahim,2025))
These results demonstrate that the Ebrahim-Farrington test maintains the correct type I error rate and its statistic converges to the standard normal distribution as sample size increases, validating its asymptotic properties.
Farrington, C. P. (1996). On Assessing Goodness of Fit of Generalized Linear Models to Sparse Data. Journal of the Royal Statistical Society. Series B (Methodological), 58(2), 349-360.
Ebrahim, Khaled Ebrahim (2025). Goodness-of-Fits Tests and Calibration Machine Learning Algorithms for Logistic Regression Model with Sparse Data. Master’s Thesis, Alexandria University.
Hosmer, D. W., & Lemeshow, S. (2000). Applied Logistic Regression, Second Edition. New York: Wiley.
If you use this package in your research, please cite:
Ebrahim, K. E. (2025). ebrahim.gof: Ebrahim-Farrington Goodness-of-Fit Test
for Logistic Regression. R package version 1.0.0.
https://github.com/ebrahimkhaled/ebrahim.gof
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
This project is licensed under the GPL-3 License
Ebrahim Khaled Ebrahim
Alexandria University
Email: ebrahimkhaled@alexu.edu.eg
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.