The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

healthbR

Lifecycle: experimental CRAN status R-CMD-check

Overview

healthbR provides easy access to Brazilian public health survey data directly from R. The package downloads, caches, and processes data from official sources, returning clean, analysis-ready tibbles following tidyverse conventions.

Currently supported data sources:

Planned for future releases:

Installation

You can install the development version of healthbR from GitHub:

# install.packages("pak")
pak::pak("SidneyBissoli/healthbR")

Usage

Check available years

library(healthbR)

# list available VIGITEL survey years
vigitel_years()
#> [1] 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
#> [16] 2021 2022 2023

Download and load data

# load data for a single year
df <- vigitel_data(2023)

# load data for multiple years
df <- vigitel_data(2021:2023)

Explore variables

# list variables available in a specific year
vigitel_variables(2023)

# get the data dictionary with variable descriptions
dict <- vigitel_dictionary()

# search for specific variables
dict |>
  dplyr::filter(stringr::str_detect(variable_name, "peso"))

Survey analysis with srvyr

VIGITEL uses complex survey sampling. Use the pesorake weight variable for proper inference:

library(dplyr)
library(srvyr)

# create survey design
vigitel_svy <- df |>
  as_survey_design(weights = pesorake)

# calculate weighted prevalence
vigitel_svy |>
  group_by(cidade) |>
  summarize(
    prevalence = survey_mean(diab == 1, na.rm = TRUE),
    n = unweighted(n())
  )

Performance optimization

healthbR offers three strategies for handling large datasets efficiently:

Convert Excel files to Parquet format for 10-20x faster loading:

# convert downloaded files to parquet (one-time operation
vigitel_convert_to_parquet(2020:2023)

# subsequent loads are much faster
df <- vigitel_data(2020:2023)

2. Parallel downloads

Download multiple years simultaneously (requires optional packages):

# install optional packages for parallel processing
install.packages(c("furrr", "future"))

# uses furrr for parallel processing (2-4 workers)
df <- vigitel_data(2015:2023)

3. Lazy evaluation with Arrow

For very large datasets, use lazy evaluation to process data without loading everything into memory:

# returns an Arrow Dataset (not loaded into RAM)
df_lazy <- vigitel_data(2020:2023, lazy = TRUE)

# filter and select before collecting
result <- df_lazy |>
  dplyr::filter(cidade == 1) |>
  dplyr::select(q6, q8_anos, pesorake, diab, hart) |>
  dplyr::collect()

Data sources

All data is downloaded from official Brazilian Ministry of Health repositories:

Citation

If you use healthbR in your research, please cite it:

citation("healthbR")

Contributing

Contributions are welcome! Please open an issue to discuss proposed changes or submit a pull request.

Code of Conduct

Please note that the healthbR project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

MIT © Sidney da Silva Pereira Bissoli

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.