| Title: | Epidemiology Data Dictionaries and Random Data Generators |
| Version: | 0.1.0 |
| Description: | The 'R4EPIs' project https://r4epi.github.io/sitrep/ seeks to provide a set of standardized tools for analysis of outbreak and survey data in humanitarian aid settings. This package currently provides standardized data dictionaries from Medecins Sans Frontieres Operational Centre Amsterdam for outbreak scenarios (Acute Jaundice Syndrome, Cholera, Diphtheria, Measles, Meningitis) and surveys (Retrospective mortality and access to care, Malnutrition, Vaccination coverage and Event Based Surveillance) - as described in the following https://scienceportal.msf.org/assets/standardised-mortality-surveys?utm_source=chatgpt.com. In addition, a data generator from these dictionaries is provided. It is also possible to read in any Open Data Kit format data dictionary. |
| License: | GPL-3 |
| URL: | https://github.com/R4EPI/epidict/, https://r4epi.github.io/epidict/ |
| Imports: | clipr, dplyr, readxl, rlang, stats, tibble, tidyr, utils |
| Suggests: | covr, DT, knitr, matchmaker, rmarkdown, testthat (≥ 2.1.0), withr |
| VignetteBuilder: | knitr |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.2 |
| NeedsCompilation: | no |
| Packaged: | 2025-11-10 13:13:32 UTC; spina |
| Author: | Alexander Spina |
| Maintainer: | Alexander Spina <aspina@appliedepi.org> |
| Repository: | CRAN |
| Date/Publication: | 2025-11-13 18:50:07 UTC |
epidict: Epidemiology Data Dictionaries and Random Data Generators
Description
The 'R4EPIs' project https://r4epi.github.io/sitrep/ seeks to provide a set of standardized tools for analysis of outbreak and survey data in humanitarian aid settings. This package currently provides standardized data dictionaries from Medecins Sans Frontieres Operational Centre Amsterdam for outbreak scenarios (Acute Jaundice Syndrome, Cholera, Diphtheria, Measles, Meningitis) and surveys (Retrospective mortality and access to care, Malnutrition, Vaccination coverage and Event Based Surveillance) - as described in the following https://scienceportal.msf.org/assets/standardised-mortality-surveys?utm_source=chatgpt.com. In addition, a data generator from these dictionaries is provided. It is also possible to read in any Open Data Kit format data dictionary.
Author(s)
Maintainer: Alexander Spina aspina@appliedepi.org (ORCID)
Authors:
Zhian N. Kamvar zkamvar@gmail.com (ORCID)
Lukas Richter
Patrick Keating
Other contributors:
Annick Lenglet [contributor]
Applied Epi Incorporated [copyright holder]
Medecins Sans Frontieres Operational Centre Amsterdam [funder]
See Also
Useful links:
Dictionary-based helper for aligning your data to variables used in a script
Description
Dictionary-based helper for aligning your data to variables used in a script
Usage
dict_rename_helper(
dictionary,
varnames,
varnames_type,
rmd,
copy_to_clipboard = TRUE
)
Arguments
dictionary |
A dataframe of the dictionary which you would like to use. |
varnames |
The name of |
varnames_type |
The name of |
rmd |
Path to the Rmarkdown file which you would like to compare to. |
copy_to_clipboard |
if |
Value
A dplyr command used to rename columns in your data frame according to the dictionary
See Also
Generate random linelist or survey data
Description
Based on a dictionary generator like msf_dict(),
this function will generate a randomized dataset based on values defined in
the dictionaries. The randomized dataset produced should mimic an excel
export from DHIS2 or ODK.
Usage
gen_data(dictionary, varnames = "name", numcases = 300, org = "MSF")
Arguments
dictionary |
Specify which dictionary you would like to use. |
varnames |
Specify name of column that contains variable names.
If |
numcases |
Specify the number of cases you want (default is 300) |
org |
Specify the organisation which the dictionary belongs to. Currently, only MSF exists. In the future, dictionaries from WHO and other organizations may become available. |
Value
a data frame with cases in rows and variables in columns. The number of columns will vary from dictionary to dictionary, so please use the dictionary functions to generate a corresponding dictionary.
Examples
if (require("dplyr") & require("matchmaker")) {
withAutoprint({
# You will often want to use MSF dictionaries to translate codes to human-
# readable variables. Here, we generate a data set of 20 cases:
dat <- gen_data(
dictionary = "Cholera",
varnames = "data_element_shortname",
numcases = 20,
org = "MSF"
)
print(dat)
# We want the expanded dictionary, so we will select `compact = FALSE`
dict <- msf_dict(dictionary = "Cholera", long = TRUE, compact = FALSE, tibble = TRUE)
print(dict)
# Now we can use matchmaker to filter the data:
dat_clean <- matchmaker::match_df(dat, dict,
from = "option_code",
to = "option_name",
by = "data_element_shortname",
order = "option_order_in_set"
)
print(dat_clean)
})
}
MSF data dictionaries and dummy datasets
Description
These function produce MSF dictionaries based on DHIS2 (for OCA outbreaks) and ODK (for intersectional outbreaks and surveys) data sets defining the data element name, code, short names, types, and key/value pairs for translating the codes into human-readable format.
Usage
msf_dict(dictionary, tibble = TRUE, long = TRUE, compact = TRUE)
Arguments
dictionary |
Specify which dictionary you would like to use.
|
tibble |
If |
long |
If |
compact |
If |
Value
A data frame (tibble) containing the specified MSF data dictionary.
If long = TRUE, each variable-option pair is represented as a row.
If compact = TRUE, the options are nested as a data frame column named
"options". If long = FALSE, a list is returned with two data frames:
dictionary and options.
See Also
read_dict() gen_data() matchmaker::match_df()
Examples
if (require("dplyr") & require("matchmaker")) {
withAutoprint({
# You will often want to use MSF dictionaries to translate codes to human-
# readable variables. Here, we generate a data set of 20 cases:
dat <- gen_data(
dictionary = "Cholera",
varnames = "data_element_shortname",
numcases = 20,
org = "MSF"
)
print(dat)
# We want the expanded dictionary, so we will select `compact = FALSE`
dict <- msf_dict(dictionary = "Cholera", long = TRUE, compact = FALSE, tibble = TRUE)
print(dict)
# Now we can use matchmaker to filter the data:
dat_clean <- matchmaker::match_df(dat, dict,
from = "option_code",
to = "option_name",
by = "data_element_shortname",
order = "option_order_in_set"
)
print(dat_clean)
})
}
Data dictionaries
Description
These function read dictionaries in ODK and DHIS2 formats, and reformats them for dataset recoding into human-readable format.
Usage
read_dict(path, sheet, format, tibble = TRUE, long = TRUE, compact = TRUE)
Arguments
path |
Define the path to .xlsx file where the dictionary is stored |
sheet |
Optional, if your sheets have non-standard names (e.g. using a disease pre-fix) - this can be specified here. |
format |
The format which the dictionary is in. Currently supports "DHIS2" and "ODK". |
tibble |
If |
long |
If |
compact |
If |
Value
If long = TRUE, returns a tibble of the merged dictionary and
value options. If long = FALSE, returns a list with elements dictionary
and options. If compact = TRUE, options are nested as a column of
data frames under "options".