The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
get_ideb() has a new signature:
get_ideb(level, stage, metric, year, quiet). The old
positional usage get_ideb(year, level, stage) still works
with a deprecation warning, but the year parameter now
filters IDEB editions instead of selecting which file to download.get_ideb() now returns data in tidy long format instead
of wide format. Output columns depend on the metric
parameter ("indicador", "aprovacao",
"nota", "meta").get_ideb() now supports 5 geographic levels:
"escola", "municipio", "estado",
"regiao", and "brasil" (previously only escola
and municipio).get_ideb() always downloads the most recent IDEB file
available, which contains the full historical series. The
year parameter filters editions.get_ideb_series() is deprecated. Use
get_ideb(level, stage, metric) instead.list_ideb_available() now returns level,
stage, and metric columns (previously returned
year, level, stage).uf parameter has been removed from
get_ideb(). Filter the result with
dplyr::filter() instead.download_inep_file() timeout is now configurable via
options(educabR.download_timeout = N) (seconds; default
600). Raise it when downloading large microdata (e.g. ENEM
participantes at ~1.6 GB) over a slow link (issue #7).read_inep_file() now warns before reading files larger
than 500 MB entirely into memory, suggesting n_max or UF
filters to reduce memory pressure. Suppressible with
quiet = TRUE; all get_*() callers propagate
their quiet argument (issue #5).get_ideb() no longer consumes several GB of RAM for
school-level reads (issue #1). The xlsx is now read with column
projection: only the vl_* columns matching the requested
metric (and year, when given) are parsed; the
others are skipped at the readxl C++ layer. INEP’s NA tokens
("", "-", "ND") are also passed
to read_excel(na = ...) so the missing-value strings never
get allocated as R character vectors. For
level = "escola", stage = "anos_iniciais", metric = "indicador",
this cuts the in-memory result from ~133 MB to ~37 MB (4 years) or ~19
MB (1 year), with proportional drops in peak memory during reshape.read_ideb_excel(), read_excel_safe()) and the
FUNDEB enrollment OData fetcher (fetch_fundeb_enrollment())
are now normalized to UTF-8 NFC, matching the behavior already in
read_inep_file(). Previously, equality comparisons against
literals such as filter(rede == "Pública") could silently
return zero rows on Windows because the source-file encoding produced
non-canonical strings. The shared helper
normalize_utf8_nfc() is now applied at every read
entrypoint so all four code paths agree. Affects
get_ideb(), get_cpc(), get_igc(),
get_fundeb_enrollment().read_excel_safe() (used by get_cpc() and
get_igc()) now passes INEP’s missing-value tokens
("", "-", "ND", en/em dashes) to
readxl::read_excel(na = ...) so those cells are loaded as
NA instead of character strings cleaned up post-hoc (issue
#4). Previously, a column whose first rows were all "-"
could be inferred as logical and later numeric values
silently dropped. clean_dash_values() remains as a safety
net but is now largely redundant for CPC/IGC.download_inep_file() now verifies downloaded files
before caching them (issue #3). Three checks run after the bytes hit
disk: file size against the server’s Content-Length (1%
tolerance, catches truncated downloads), HTML-masquerade detection on
the first 64 bytes (catches INEP maintenance pages served with HTTP
200), and ZIP magic-bytes (PK\x03\x04) for
.zip destinations (catches proxy corruption). On any
failure the corrupt file is deleted and the user gets a clear error
telling them to retry, instead of a cryptic readxl /
readr failure on the next call.validate_year() now rejects vectors and non-numeric
input with a clear error pointing at purrr::map_dfr() for
multi-year composition (issue #2). Previously, passing
c(2017, 2019) to any of the 13 affected getters
(get_cpc, get_idd, get_igc,
get_capes, get_saeb, get_enem,
get_enem_itens, get_enade,
get_encceja, get_fundeb_distribution,
get_fundeb_enrollment, get_censo_escolar,
get_censo_superior) hit either a cryptic
length > 1 error (R ≥ 4.2) or silently used only the
first element (R < 4.2). get_ideb() is unaffected — it
intentionally accepts year vectors.extract_zip() cleaned up: removed dead
if (TRUE) branch and an unreachable
cli_abort(); the muffle on extraction warnings was
tightened from the broad erro|error pattern to the two
specific messages that motivated it (issue #6).RoxygenNote bumped to 8.0.0 and man/*.Rd
regenerated; systemfonts and textshaping
declared in Suggests: to silence the cosmetic
R CMD check NOTE about packages pulled transitively by
pkgdown.R CMD check warnings cleared: em-dashes in
cli_abort() message strings in
R/utils-download.R are now written with Unicode escapes (R
requires ASCII-only in code strings; comments are exempt).available_years() now dynamically discovers available
years by querying data sources (HEAD requests for INEP, OData queries
for FNDE). Results are cached per session. Falls back to a hardcoded
list when offline.available_years() now accepts
"fundeb_enrollment" as a separate dataset name. Previously,
"fundeb" was shared between distribution and
enrollment.CO_*, CD_*) are now read as
character instead of numeric across all datasets. This prevents loss of
leading zeros in codes like municipality codes, course codes, and
institution codes.get_enade() failing for 9 of 19 available years.
INEP uses inconsistent URLs for ENADE: _LGPD suffix for
2012-2019, .rar format for 2022. Added hardcoded URL map
(enade_urls) with all 19 correct URLs.get_fundeb_enrollment() accepting years with no
data in the FNDE API. The API currently only has data for
2017-2018.clear_cache() failing to delete files on Windows
when they were memory-mapped by readr. Now deletes entire directories
and warns about locked files..rar archive extraction support via 7-Zip.
find_7z() searches common Windows install paths when
7z is not in PATH.strip_diacriticals() internal helper for
encoding-safe text matching.read_inep_file() now auto-detects code columns
(CO_*, CD_*) from the file header and reads
them as character. No user action required.read_ideb_excel() and read_excel_safe()
(CPC/IGC) now convert code columns to character after reading.get_fundeb_distribution(): Download FUNDEB resource
distribution data (years 2007-2026). Reads all sheets from STN Excel
files and returns tidy long-format data with monthly transfer amounts by
state, funding source, destination (states/municipalities), and table
type (fundeb/adjustment).get_fundeb_enrollment(): Download FUNDEB enrollment
data. Fetches from FNDE OData API with automatic pagination. Results
cached as CSV.uf,
source (FPE, FPM, ICMS, etc.), and destination
(“uf” or “municipio”).https://www.tesourotransparente.gov.br) and FNDE
(https://www.fnde.gov.br).get_capes(): Download CAPES graduate education data
(years 2013-2024)."programas"), students
("discentes"), faculty ("docentes"), courses
("cursos"), and theses/dissertations catalog
("catalogo").https://dadosabertos.capes.gov.br).get_cpc(): Download CPC data (years 2007-2019,
2021-2023; no 2020 edition).readxl package.get_igc(): Download IGC data (years 2007-2019,
2021-2023; no 2020 edition).read_excel_safe(): Internal helper to read Excel files
with error handling.get_enem_escola(): Download ENEM results aggregated by
school (2005-2015).get_idd(): Download IDD microdata (years 2014-2019,
2021-2023; no 2020 edition).extract_archive() utility.get_encceja(): Download ENCCEJA microdata (years
2014-2024).get_enade(): Download ENADE microdata.get_censo_superior(): Download Higher Education Census
microdata (years 2009-2024)."ies"),
courses ("cursos"), students ("alunos"), and
faculty ("docentes").list_censo_superior_files(): List available files in a
downloaded census.uf parameter.get_saeb(): Download SAEB microdata (years 2011, 2013,
2015, 2017, 2019, 2021, 2023)."aluno"), school ("escola"), principal
("diretor"), and teacher ("professor")
questionnaires.level parameter.iconv()
instead of validEnc()."Latin-1" encoding name to "latin1"
for Windows codepage compatibility.type parameter for split
files ("participantes", "resultados").dt_*).vl_* columns from character to numeric,
handling "-", "ND", and comma decimals.get_ideb_series() now shows per-year progress
indication (e.g., “processing IDEB 2017 (1/4)”) and propagates the
quiet parameter to inner get_ideb()
calls.get_enem_itens() now has keep_zip
parameter for consistency with get_enem() and
get_censo_escolar().README.md) as default; Portuguese
version renamed to README.pt-br.md with cross-links between
both.@param year ranges in documentation to match
available_years():
get_enem() / get_enem_itens(): 2009-2023
-> 1998-2024get_censo_escolar(): 2007-2024 -> 1995-2024@family tags to group related functions in help
pages (ENEM, IDEB, School Census, cache).getting-started.Rmd).README.pt-br.md.enem_summary(): statistics calculation,
NA handling, grouping by variable, and error on missing score
columns.validate_data(): empty data, few
columns, missing expected columns per dataset.\donttest with \dontrun in all
examples per CRAN request.set_cache_dir() example that created a directory
in the user’s home (~/educabR_cache) during CRAN checks.
Now uses tempdir() in examples.First public release.
get_ideb(): Download IDEB data (years 2017, 2019, 2021,
2023).get_ideb_series(): Download IDEB historical series
across multiple years.list_ideb_available(): List available year/stage/level
combinations.get_enem(): Download ENEM microdata (years
1998-2024).get_enem_itens(): Download ENEM item response
data.enem_summary(): Calculate summary statistics for ENEM
scores.get_censo_escolar(): Download School Census microdata
(years 1995-2024).list_censo_files(): List available files in a
downloaded census.set_cache_dir(): Set custom cache directory.get_cache_dir(): Get current cache directory.clear_cache(): Clear cached files.list_cache(): List cached files with metadata.available_years(): Get available years for each
dataset.These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.