EpidigiR: Digital Epidemiological Analysis and Visualization Tools

The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Esther Atsabina Wanjala

2025-11-03

1 Introduction to EpidigiR: Epidemiological Analysis and Visualization

EpidigiR is an R package for epidemiological analysis, modeling, and visualization…

EpidigiR is an R package for epidemiological analysis, modeling, and visualization, designed with minimal dependencies and comprehensive functionality. It provides three main functions to cover 12 epidemiological topics, including a digital epidemiology aspect that leverages real-time data integration and advanced computational techniques to enhance disease tracking and prediction.

epi_analyze: Performs summary statistics, SIR modeling, DALY calculations, age standardization, diagnostic test evaluation, and NLP keyword extraction.
epi_model: Handles clinical trial power calculation, survival analysis, SNP association, logistic regression, k-means clustering, Random Forest, and SVM.
epi_visualize: Creates visualizations for prevalence mapping, epidemic curves, scatter plots, and boxplots.

The package includes nine datasets to support these analyses: epi_prevalence, sir_data, geno_data, ml_data, nlp_data, clinical_data, daly_data, survey_data, diagnostic_data, and survival_data.

This vignette demonstrates how to use these functions and datasets for various epidemiological tasks.

3 Datasets

The package includes the following datasets:

epi_prevalence: Disease prevalence by region and age group, with spatial coordinates (12 rows).
sir_data: Simulated SIR model output (50 rows).
geno_data: Genotype and case-control data for SNP analysis (100 rows).
ml_data: Patient data for machine learning (logistic regression, clustering, Random Forest, SVM; 100 rows).
nlp_data: Epidemiological text data for NLP (100 rows).
clinical_data: Clinical trial data for power calculations and outcome analysis (200 rows).
daly_data: Data for DALY calculations (20 rows).
survey_data: Data for age standardization (20 rows).
diagnostic_data: Data for diagnostic test evaluation (10 rows).
survival_data: Data for survival analysis (100 rows).

5 Summary Statistics

data(epi_prevalence)
result <- epi_analyze(
  epi_prevalence,
  outcome = "cases",
  population = "population",
  group = "region",
  type = "summary"
)
print(result)

##   group mean_outcome population prevalence incidence_rate
## 1  East     140.0000   34000.00  0.4243056       4.117647
## 2 North     133.3333   30000.00  0.4666667       4.444444
## 3 South     120.0000   29333.33  0.4345238       4.090909
## 4  West     100.0000   24333.33  0.4333333       4.109589

6 SIR Epidemic Model

sir_result <- epi_analyze(
  data = NULL, outcome = NULL, type = "sir",
  N = 1000, beta = 0.3, gamma = 0.1, days = 50
)
epi_visualize(sir_result, x = "time", y = "Infected", type = "curve", main = "Epidemic Curve")

7 Spatial map

data(epi_prevalence)
coordinates(epi_prevalence) <- ~lon + lat
epi_visualize(epi_prevalence, x = "prevalence", type = "map", main = "Prevalence Map")

8 Logistic Model

data(clinical_data)
clinical_data$outcome <- as.factor(clinical_data$outcome)
model <- epi_model(clinical_data, formula = outcome ~ age + health_score + dose, type = "logistic")
head(model$predictions)

##   lambda.min
## 1       0.41
## 2       0.41
## 3       0.41
## 4       0.41
## 5       0.41
## 6       0.41

9 Random Forest with Clinical Data

rf_model <- epi_model(clinical_data, formula = outcome ~ age + health_score + dose, type = "rf")

## note: only 2 unique complexity parameters in default grid. Truncating the grid to 2 .

head(rf_model$predictions)

## NULL

10 Global Health Burden (DALY)

data(daly_data)
epi_analyze(daly_data, outcome = NULL, type = "daly")

##       group      daly
## 1   group_1  809.1125
## 2   group_2 1171.2362
## 3   group_3  806.3073
## 4   group_4 1392.1371
## 5   group_5 1291.4882
## 6   group_6  509.8396
## 7   group_7  870.1247
## 8   group_8 1220.5410
## 9   group_9  776.4134
## 10 group_10  627.1544
## 11 group_11 1444.5109
## 12 group_12  964.0353
## 13 group_13 1070.6309
## 14 group_14 1023.3304
## 15 group_15  253.7084
## 16 group_16 1174.8507
## 17 group_17  712.7858
## 18 group_18  285.2372
## 19 group_19  588.3101
## 20 group_20 1113.2849

11 SNP Association

data(geno_data)
epi_model(geno_data, formula = outcome ~ snp1 + snp2, type = "snp")

##           statistic   p_value
## X-squared  1.769353 0.4128477

12 Age Standardization

data(survey_data)
epi_analyze(survey_data, outcome = NULL, type = "age_standardize")

##   standardized_rate
## 1          33.45531

13 Machine-learning-logistic

data(ml_data)
epi_model(ml_data, formula = outcome ~ age + exposure + genetic_risk, type = "logistic")

## $coefficients
## 4 x 1 sparse Matrix of class "dgCMatrix"
##              lambda.min
## (Intercept)  -0.4054651
## age           .        
## exposure      .        
## genetic_risk  .        
## 
## $predictions
##     lambda.min
## 1          0.4
## 2          0.4
## 3          0.4
## 4          0.4
## 5          0.4
## 6          0.4
## 7          0.4
## 8          0.4
## 9          0.4
## 10         0.4
## 11         0.4
## 12         0.4
## 13         0.4
## 14         0.4
## 15         0.4
## 16         0.4
## 17         0.4
## 18         0.4
## 19         0.4
## 20         0.4
## 21         0.4
## 22         0.4
## 23         0.4
## 24         0.4
## 25         0.4
## 26         0.4
## 27         0.4
## 28         0.4
## 29         0.4
## 30         0.4
## 31         0.4
## 32         0.4
## 33         0.4
## 34         0.4
## 35         0.4
## 36         0.4
## 37         0.4
## 38         0.4
## 39         0.4
## 40         0.4
## 41         0.4
## 42         0.4
## 43         0.4
## 44         0.4
## 45         0.4
## 46         0.4
## 47         0.4
## 48         0.4
## 49         0.4
## 50         0.4
## 51         0.4
## 52         0.4
## 53         0.4
## 54         0.4
## 55         0.4
## 56         0.4
## 57         0.4
## 58         0.4
## 59         0.4
## 60         0.4
## 61         0.4
## 62         0.4
## 63         0.4
## 64         0.4
## 65         0.4
## 66         0.4
## 67         0.4
## 68         0.4
## 69         0.4
## 70         0.4
## 71         0.4
## 72         0.4
## 73         0.4
## 74         0.4
## 75         0.4
## 76         0.4
## 77         0.4
## 78         0.4
## 79         0.4
## 80         0.4
## 81         0.4
## 82         0.4
## 83         0.4
## 84         0.4
## 85         0.4
## 86         0.4
## 87         0.4
## 88         0.4
## 89         0.4
## 90         0.4
## 91         0.4
## 92         0.4
## 93         0.4
## 94         0.4
## 95         0.4
## 96         0.4
## 97         0.4
## 98         0.4
## 99         0.4
## 100        0.4

14 Survival Analysis

Perform survival analysis using survival_data.

data(survival_data)
epi_model(survival_data, type = "survival")

## $survfit
## Call: survfit(formula = surv_obj ~ 1)
## 
##        n events median 0.95LCL 0.95UCL
## [1,] 100     71     11    9.24    14.4
## 
## $summary
## Call: survfit(formula = surv_obj ~ 1)
## 
##    time n.risk n.event survival std.err lower 95% CI upper 95% CI
##   0.046    100       1   0.9900 0.00995      0.97069        1.000
##   0.292     99       1   0.9800 0.01400      0.95294        1.000
##   0.316     98       1   0.9700 0.01706      0.93714        1.000
##   0.318     97       1   0.9600 0.01960      0.92235        0.999
##   0.421     96       1   0.9500 0.02179      0.90823        0.994
##   0.562     94       1   0.9399 0.02379      0.89440        0.988
##   0.674     93       1   0.9298 0.02559      0.88096        0.981
##   0.986     90       1   0.9195 0.02731      0.86745        0.975
##   1.453     89       1   0.9091 0.02889      0.85422        0.968
##   1.883     88       1   0.8988 0.03036      0.84122        0.960
##   2.161     87       1   0.8885 0.03172      0.82842        0.953
##   2.596     84       1   0.8779 0.03306      0.81543        0.945
##   2.804     82       1   0.8672 0.03434      0.80242        0.937
##   2.810     81       1   0.8565 0.03555      0.78956        0.929
##   2.847     80       1   0.8458 0.03668      0.77685        0.921
##   3.000     78       1   0.8349 0.03778      0.76407        0.912
##   3.062     77       1   0.8241 0.03881      0.75142        0.904
##   3.135     76       1   0.8132 0.03979      0.73888        0.895
##   3.165     74       1   0.8022 0.04074      0.72625        0.886
##   3.197     73       1   0.7913 0.04164      0.71372        0.877
##   3.771     72       1   0.7803 0.04249      0.70129        0.868
##   3.800     71       1   0.7693 0.04328      0.68895        0.859
##   4.204     70       1   0.7583 0.04404      0.67671        0.850
##   4.802     67       1   0.7470 0.04481      0.66411        0.840
##   4.807     66       1   0.7357 0.04554      0.65160        0.831
##   5.066     65       1   0.7243 0.04622      0.63917        0.821
##   5.646     64       1   0.7130 0.04687      0.62683        0.811
##   5.726     63       1   0.7017 0.04747      0.61457        0.801
##   5.887     60       1   0.6900 0.04810      0.60189        0.791
##   5.909     59       1   0.6783 0.04868      0.58930        0.781
##   6.293     57       1   0.6664 0.04926      0.57653        0.770
##   6.436     56       1   0.6545 0.04980      0.56383        0.760
##   6.855     55       1   0.6426 0.05030      0.55122        0.749
##   7.907     54       1   0.6307 0.05075      0.53868        0.738
##   8.457     51       1   0.6183 0.05124      0.52564        0.727
##   8.498     50       1   0.6060 0.05169      0.51268        0.716
##   9.240     49       1   0.5936 0.05209      0.49981        0.705
##   9.659     48       1   0.5812 0.05245      0.48701        0.694
##   9.724     47       1   0.5689 0.05278      0.47430        0.682
##   9.746     46       1   0.5565 0.05306      0.46166        0.671
##  10.244     44       1   0.5439 0.05334      0.44875        0.659
##  10.279     43       1   0.5312 0.05358      0.43593        0.647
##  10.477     42       1   0.5186 0.05377      0.42319        0.635
##  10.672     41       1   0.5059 0.05393      0.41053        0.623
##  11.003     40       1   0.4933 0.05404      0.39795        0.611
##  11.149     38       1   0.4803 0.05416      0.38505        0.599
##  11.293     37       1   0.4673 0.05423      0.37225        0.587
##  11.685     36       1   0.4543 0.05425      0.35952        0.574
##  11.920     35       1   0.4413 0.05423      0.34688        0.562
##  12.290     34       1   0.4284 0.05417      0.33433        0.549
##  12.546     32       1   0.4150 0.05410      0.32140        0.536
##  13.291     30       1   0.4011 0.05404      0.30806        0.522
##  13.480     29       1   0.3873 0.05392      0.29483        0.509
##  14.395     27       1   0.3730 0.05380      0.28112        0.495
##  14.967     24       1   0.3574 0.05375      0.26618        0.480
##  15.631     21       1   0.3404 0.05382      0.24970        0.464
##  15.705     19       1   0.3225 0.05389      0.23243        0.447
##  15.707     18       1   0.3046 0.05379      0.21546        0.431
##  16.059     17       1   0.2867 0.05353      0.19881        0.413
##  16.199     16       1   0.2687 0.05309      0.18246        0.396
##  16.424     15       1   0.2508 0.05249      0.16643        0.378
##  17.312     14       1   0.2329 0.05171      0.15074        0.360
##  18.569     13       1   0.2150 0.05074      0.13538        0.341
##  18.878     11       1   0.1954 0.04975      0.11868        0.322
##  21.678     10       1   0.1759 0.04846      0.10251        0.302
##  22.483      9       1   0.1564 0.04685      0.08691        0.281
##  25.361      8       1   0.1368 0.04489      0.07192        0.260
##  27.253      5       1   0.1095 0.04346      0.05026        0.238
##  40.410      3       1   0.0730 0.04155      0.02390        0.223
##  44.987      2       1   0.0365 0.03312      0.00616        0.216
##  72.110      1       1   0.0000     NaN           NA           NA

15 NLP-keyword Extraction

data(nlp_data)
nlp_result <- epi_analyze(nlp_data, outcome = NULL, population = NULL, type = "nlp", n = 5)
head(nlp_result)

##                word frequency
## region       region        33
## dengue       dengue        32
## south         south        32
## influenza influenza        31
## north         north        31

15.1 K-means Clustering

data(ml_data)
epi_model(ml_data[, c("age", "exposure", "genetic_risk")], type = "kmeans", k = 3)

## $clusters
##   [1] 3 2 3 1 3 1 1 3 2 2 1 2 2 1 3 2 3 1 3 2 3 2 2 2 2 1 2 2 2 3 3 1 3 3 3 3 1
##  [38] 1 1 3 2 2 2 2 1 2 2 2 2 3 1 3 2 2 2 2 3 2 3 2 3 2 3 2 2 3 3 2 2 2 3 2 1 3
##  [75] 3 3 2 3 1 3 1 2 3 2 3 3 2 3 1 1 2 2 1 3 1 3 2 3 2 3
## 
## $centers
##        age  exposure genetic_risk
## 1 74.00521 0.4178058    0.3936161
## 2 34.17596 0.5159618    0.4529798
## 3 55.09524 0.4881809    0.5910485

16 SVM-Modelling

data(ml_data)
ml_data$outcome <- as.factor(ml_data$outcome)
svm_model <- epi_model(ml_data, formula = outcome ~ age + exposure + genetic_risk, type = "svmRadial")
svm_model$performance

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 60 40
##          1  0  0
##                                           
##                Accuracy : 0.6             
##                  95% CI : (0.4972, 0.6967)
##     No Information Rate : 0.6             
##     P-Value [Acc > NIR] : 0.5433          
##                                           
##                   Kappa : 0               
##                                           
##  Mcnemar's Test P-Value : 6.984e-10       
##                                           
##             Sensitivity : 1.0             
##             Specificity : 0.0             
##          Pos Pred Value : 0.6             
##          Neg Pred Value : NaN             
##              Prevalence : 0.6             
##          Detection Rate : 0.6             
##    Detection Prevalence : 1.0             
##       Balanced Accuracy : 0.5             
##                                           
##        'Positive' Class : 0               
##

17 Diagnostic Tests

data(diagnostic_data)
epi_analyze(diagnostic_data, outcome = NULL, type = "diagnostic")

##    test_id sensitivity specificity  accuracy
## 1   test_1   0.8602151   0.8585859 0.8593750
## 2   test_2   0.7619048   0.6923077 0.7257143
## 3   test_3   0.8181818   0.7857143 0.8012422
## 4   test_4   0.8253968   0.8846154 0.8622754
## 5   test_5   0.8584906   0.8380952 0.8483412
## 6   test_6   0.9108911   0.7625000 0.8453039
## 7   test_7   0.8349515   0.8000000 0.8196721
## 8   test_8   0.8596491   0.7641509 0.8136364
## 9   test_9   0.9135802   0.7583333 0.8208955
## 10 test_10   0.8823529   0.6746988 0.7797619

18 boxplot-visualization

data(clinical_data)
epi_visualize(clinical_data, x = "arm", y = "outcome", type = "boxplot", main = "Outcome by Treatment Arm")

19 Scatter-visualization

data(ml_data)
epi_visualize(ml_data, x = "age", y = "outcome", type = "scatter", main = "Age vs. Disease Outcome")

20 Conclusion

EpidigiR offers a streamlined yet powerful toolkit for epidemiological analysis, featuring three key functions—epi_analyze, epi_model, and epi_visualize—and nine datasets that address all major topics. These tools support a range of analyses, from SIR modeling to sophisticated machine learning methods such as Random Forest and SVM. Furthermore, it integrates a digital epidemiology component, utilizing real-time data and advanced computational approaches to improve disease monitoring and forecasting, providing a valuable resource for researchers and analysts.

21 License

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.

EpidigiR: Digital Epidemiological Analysis and Visualization Tools

Esther Atsabina Wanjala

2025-11-03

1 Introduction to EpidigiR: Epidemiological Analysis and Visualization

2 Setup

3 Datasets

4 Examples

5 Summary Statistics

6 SIR Epidemic Model

7 Spatial map

8 Logistic Model

9 Random Forest with Clinical Data

10 Global Health Burden (DALY)

11 SNP Association

12 Age Standardization

13 Machine-learning-logistic

14 Survival Analysis

15 NLP-keyword Extraction

15.1 K-means Clustering

16 SVM-Modelling

17 Diagnostic Tests

18 boxplot-visualization

19 Scatter-visualization

20 Conclusion

21 License