The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
pulso provides programmatic access to Colombia’s Gran
Encuesta Integrada de Hogares (GEIH), the household labor force survey
published monthly by DANE (Departamento Administrativo Nacional de
Estadistica).
library(pulso)
# 2024-06 is a validated period -- loads without any warning
df <- pulso_load(year = 2024, month = 6, module = "ocupados")The result is a tibble with the survey microdata. By default, all columns are returned with their original DANE codes (e.g., P6020, P3271).
pulso maintains a registry of periods that have been manually verified against DANE published figures. As of v0.1.0-rc2, 5 periods are validated:
For all other periods, pulso_load() raises a
pulso_data_not_validated error by default:
# Raises pulso_data_not_validated -- 2024-09 is not yet validated
df <- pulso_load(year = 2024, month = 9, module = "ocupados")
# Explicitly allow unvalidated periods -- emits a visible warning
df <- pulso_load(year = 2024, month = 9, module = "ocupados",
allow_unvalidated = TRUE)To check the validation status of a specific period:
Or list all validated periods:
Pass metadata = TRUE to get DANE codebook information
attached to the result:
You can describe individual columns:
Or list metadata for all columns:
pulso ships a canonical variable catalog
(variable_map.json) that maps harmonized variable names to
their epoch-specific DANE source codes. These catalog functions work
offline – no data download needed.
List all canonical variables (first 10 rows):
library(pulso)
vars <- pulso_list_variables()
head(vars[, c("canonical_name", "module", "has_warning")], 10)
#> # A tibble: 10 × 3
#> canonical_name module has_warning
#> <chr> <chr> <lgl>
#> 1 alfabetiza caracteristicas_generales FALSE
#> 2 anios_educ caracteristicas_generales TRUE
#> 3 area caracteristicas_generales TRUE
#> 4 asiste_educ caracteristicas_generales FALSE
#> 5 busco_trabajo desocupados TRUE
#> 6 condicion_actividad caracteristicas_generales TRUE
#> 7 cotiza_pension ocupados FALSE
#> 8 departamento caracteristicas_generales FALSE
#> 9 disponible desocupados TRUE
#> 10 edad caracteristicas_generales FALSEDescribe a single canonical variable and its epoch mappings:
cat(pulso_describe_variable("sexo"))
#> Variable: sexo
#> Module: caracteristicas_generales
#> Description: Sexo de la persona.
#> Epochs:
#> geih_2006_2020: P6020
#> geih_2021_present: P3271
#> WARNING: El código de variable cambió de P6020 (marco 2005) a P3271 (marco 2018). La Phase 1 Curator reportó P6016 como sexo en GEIH-2, pero los datos del June 2024 muestran P6016 con 17+ valores. P3271 (binario 1/2, cubre 70.020 personas) es el candidato confirmado; requiere verificación humana contra el cuestionario DANE.Describe a survey module (reads sources.json bundled in
the package):
cat(pulso_describe("ocupados"))
#> Module: ocupados
#> Level: persona
#> Description: Información laboral de las personas ocupadas en la semana de referencia.
#> Available in epochs: geih_2006_2020, geih_2021_present
#> Harmonized variables (11): cotiza_pension, hogar_id, horas_trabajadas_sem, ingreso_laboral, ocupacion ... and 6 moreGEIH is Colombia’s primary labor market survey, conducted monthly since 2007. It collects data on:
Microdata is freely published by DANE in monthly zip files.
pulso automates the download, parsing, and harmonization
across the four GEIH design epochs (2007-2018, 2019-2023, 2024-present,
plus the historical ECH 2000-2006).
pulso (R) mirrors the API of pulso-co
(Python). For example:
# Python
import pulso
df = pulso.load(year=2024, month=6, module="ocupados", metadata=True)
print(pulso.describe_column(df, "P6020"))# R
library(pulso)
df <- pulso_load(year = 2024, month = 6, module = "ocupados",
metadata = TRUE)
cat(pulso_describe_column(df, "p6020"))Both packages share the same canonical data files (sources.json, variable_map.json, dane_codebook.json) via the monorepo at https://github.com/Stebandido77/pulso.
Downloaded microdata is cached at
tools::R_user_dir("pulso", "cache") to avoid
re-downloading. Pass cache = FALSE to force
re-download.
If you used pulso_load() in earlier development
versions, note that the default behavior has changed for
unvalidated periods:
pulso_data_not_validated
unless allow_unvalidated = TRUE is specifiedThis change aligns the R package with pulso-co (Python)
and protects users from inadvertently using unvalidated data.
pulso v0.1.0-rc2 supports the following:
pulso_load()pulso_load_merged()pulso_describe_column() and
pulso_list_columns_metadata()pulso_describe()pulso_describe_variable() and
pulso_list_variables()pulso_validation_status()
and pulso_list_validated_range()Known limitations:
allow_unvalidated = TRUE for the rest, with awareness that
results may differ from DANE official tables.variable_map.json are theoretical
mappings pending empirical verification. Use has_warning
from pulso_list_variables() to identify these entries.pulso_load_merged() are deferred to v0.2.0.See the GitHub issues for roadmap and known limitations.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.