The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

db2pq: export PostgreSQL and WRDS data to Parquet

db2pq is an R package for moving data from PostgreSQL into Apache Parquet files. It is designed for both general PostgreSQL sources and the WRDS PostgreSQL service.

What it does

Installation

# install.packages("pak")
pak::pak("iangow/db2pqr")

Quickstart

Update a WRDS table

You can pass a WRDS username directly for a first call:

library(db2pq)

wrds_update_pq("dsi", "crsp", wrds_id = "your_wrds_id")

For repeated use, configure the WRDS username and PostgreSQL password outside the call. The authentication article documents the WRDS_ID, .pgpass, and wrds::wrds_set_credentials() paths. The remaining WRDS examples assume that setup is in place.

Force a re-download

wrds_update_pq("dsi", "crsp", force = TRUE)

Use SAS metadata to check for updates

wrds_update_pq("dsi", "crsp", use_sas = TRUE)

SSH setup is only needed for this SAS metadata path. See the WRDS SSH setup article for the key-based setup used by that option.

Update all tables in a schema

wrds_schema_to_pq("crsp")

Export a custom WRDS SQL query

wrds_sql_to_pq(
  "SELECT permno, date, ret FROM crsp.dsf WHERE date >= '2024-01-01'",
  table_name = "dsf_recent",
  schema = "crsp"
)

Export a local PostgreSQL table

db_to_pq(
  table_name = "company",
  schema = "comp",
  keep = c("gvkey", "conm"),
  rename = c(conm = "company_name")
)

Check when local Parquet files were last updated

pq_last_modified(schema = "crsp")

ADBC backend

The stable default transfer path uses DBI/RPostgres. The optional ADBC path can be selected with transfer_method = "adbc" when adbi and a PostgreSQL ADBC driver are installed:

adbc_diagnostics()
wrds_update_pq("dsi", "crsp", transfer_method = "adbc")

If ADBC reports an SSL/libpq error, use transfer_method = "dbi" or install a current SSL-capable adbcpostgresql build.

Parquet layout

Files are organized as:

<DATA_DIR>/<schema>/<table>.parquet

For example:

~/pq_data/crsp/dsi.parquet

The DATA_DIR environment variable sets the root directory. It can also be passed directly as data_dir to any function.

When archive = TRUE, replaced files are moved to:

<DATA_DIR>/<schema>/archive/<table>_<timestamp>.parquet

License

MIT License. See LICENSE.md.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.