The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
Open-Access Computational Biology Datasets
Efficiently access a curated library of open-access computational biology datasets. Tables support predicate pushdown and projection to the cloud storage backend, enabling quick, iterative access to otherwise massive, unwieldy tables.
bedrockbio consists of five user-facing functions:
list_namespaces(): returns a character vector of
available namespace (data source) identifiersdescribe_namespace("<name>"): returns a
namespace’s name, citation, license, context, and its tableslist_tables(namespace): returns a character vector of
table identifiers, optionally filtered to one namespacedescribe_table("<name>"): returns a table’s
context, column definitions, and partition columns (with their allowed
values)load_table("<name>"): returns a lazily-evaluated
data frame for a tabledplyr verbs (filter, select)
can be used on the data frame returned by load_table to
push down row filters and column selections to the storage backend.
Filtering on the partition columns returned by
describe_table gives the fastest reads.
Install from CRAN:
install.packages("bedrockbio")Or install the current development version from GitHub:
# install.packages("pak")
pak::pak("bedrock-bio/bedrock-bio-client/r")The R package supports macOS and Linux only: the DuckDB
iceberg extension has no MinGW build, so it cannot load on
R for Windows. Windows users can use the Python client instead,
which works on all platforms.
Load the package (and dplyr for downstream data frame
manipulation):
library(bedrockbio)
library(dplyr)List available tables:
list_tables()Describe a table to see its metadata, citation, and columns:
describe_table("ukb_ppp.pqtls")Lazily load a table, filter on partition columns (for fastest reads), select columns, and collect the relevant subset into an in-memory data frame:
df <- load_table("ukb_ppp.pqtls") |>
filter(
ancestry == "EUR",
protein_id == "A0FGR8",
panel == "Inflammation"
) |>
select(
chromosome,
position,
effect_allele,
other_allele,
beta,
neg_log_10_p_value
) |>
collect()To request the addition of a new table to the library, open an issue.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.