The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
The healthbR package provides easy access to Brazilian public health survey data directly from R. It downloads, caches, and processes data from official Ministry of Health sources, returning clean, analysis-ready tibbles that follow tidyverse conventions.
Currently, healthbR supports VIGITEL (Vigilância de Fatores de Risco e Proteção para Doenças Crônicas por Inquérito Telefônico), a telephone-based survey that monitors risk and protective factors for chronic diseases in Brazilian state capitals.
Before downloading data, you can check which years are available:
VIGITEL uses coded variable names (q6, q8, etc.). Use the dictionary to understand what each variable represents:
You can search for specific variables:
VIGITEL uses complex survey sampling with post-stratification
weights. For proper statistical inference, always use the
pesorake weight variable.
Some commonly used variables in VIGITEL:
| Variable | Description |
|---|---|
cidade |
City code (1-27 for state capitals) |
q6 |
Sex |
q8_anos |
Age in years |
pesorake |
Post-stratification weight |
diab |
Diabetes diagnosis |
hart |
Hypertension diagnosis |
fumante |
Current smoker |
imc |
Body Mass Index |
obesid |
Obesity indicator |
Consult vigitel_dictionary() for the complete list.
healthbR offers three strategies for working with large datasets efficiently.
. Parquet conversion
Convert Excel files to Parquet format for dramatically faster loading (10-20x improvement):
When downloading multiple years, healthbR automatically uses parallel
processing if the furrr package is available:
For very large datasets, use lazy evaluation to filter and select data before loading into memory:
# returns Arrow Dataset (not loaded into RAM)
df_lazy <- vigitel_data(2015:2023, lazy = TRUE)
# operations are executed lazily
result <- df_lazy |>
filter(cidade == 1, q8_anos >= 18) |>
select(q6, q8_anos, pesorake, diab, hart, imc) |>
collect()
# only now data is loadedThis approach is especially useful when you only need a subset of the data.
Here’s a complete workflow for analyzing diabetes prevalence:
library(healthbR)
library(dplyr)
library(srvyr)
# 1. load data
df <- vigitel_data(2023)
# 2. create survey design
svy <- df |>
as_survey_design(weights = pesorake)
# 3. calculate prevalence by sex
diabetes_by_sex <- svy |>
group_by(q6) |>
summarize(
prevalence = survey_mean(diab == 1, na.rm = TRUE, vartype = "ci"),
n = unweighted(n())
)
diabetes_by_sexThese binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.