The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
Transfer REDCap data to DuckDB with minimal memory overhead, designed for large datasets that exceed available RAM.
From CRAN:
install.packages("redquack")
Development version:
# install.packages("pak")
::pak("dylanpieper/redquack") pak
Data from REDCap is transferred to DuckDB in configurable chunks of record IDs:
library(redquack)
<- redcap_to_duckdb(
con redcap_uri = "https://redcap.example.org/api/",
token = "YOUR_API_TOKEN",
record_id_name = "record_id",
chunk_size = 1000
# Increase chunk size for memory-efficient systems (faster)
# Decrease chunk size for memory-constrained systems (slower)
)
Query the data with dplyr
:
library(dplyr)
<- tbl(con, "data") |>
demographics filter(demographics_complete == 2) |>
select(record_id, age, race, gender) |>
collect()
<- tbl(con, "data") |>
age_summary group_by(gender) |>
summarize(
n = n(),
mean_age = mean(age, na.rm = TRUE),
median_age = median(age, na.rm = TRUE)
|>
) collect()
Create a Parquet file directly from DuckDB (efficient for sharing data):
::dbExecute(con, "COPY (SELECT * FROM data) TO 'redcap.parquet' (FORMAT PARQUET)") DBI
Remember to close the connection when finished:
::dbDisconnect(con, shutdown = TRUE) DBI
The DuckDB database created by redcap_to_duckdb()
contains two tables:
data
: Contains all exported REDCap records with
optimized column types
::dbGetQuery(con, "SELECT * FROM data LIMIT 10") DBI
log
: Contains timestamped logs of the transfer
process for troubleshooting
::dbGetQuery(con, "SELECT timestamp, type, message FROM log ORDER BY timestamp") DBI
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.