The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Title: Using a Common Data Model on 'Spark'
Version: 0.1.0
Description: Use health data in the Observational Medical Outcomes Partnership Common Data Model format in 'Spark'. Functionality includes creating all required tables and fields and creation of a single reference to the data. Native 'Spark' functionality is supported.
License: Apache License (≥ 2)
Encoding: UTF-8
RoxygenNote: 7.3.2
Depends: R (≥ 4.1.0)
Imports: cli, datasets, DBI, dbplyr, dplyr, glue, omopgenerics (≥ 1.3.1), purrr, rlang, stringr
Suggests: testthat (≥ 3.0.0), omock, knitr, rmarkdown, CDMConnector, OmopSketch, odbc, R6, crayon, sparklyr, DatabaseConnector
Config/testthat/edition: 3
Config/testthat/parallel: false
VignetteBuilder: knitr
URL: https://OHDSI.github.io/OmopOnSpark/
NeedsCompilation: no
Packaged: 2025-10-19 18:42:36 UTC; orms0426
Author: Edward Burn ORCID iD [aut, cre], Martí Català ORCID iD [aut]
Maintainer: Edward Burn <edward.burn@ndorms.ox.ac.uk>
Repository: CRAN
Date/Publication: 2025-10-22 19:20:02 UTC

OmopOnSpark: Using a Common Data Model on 'Spark'

Description

Use health data in the Observational Medical Outcomes Partnership Common Data Model format in 'Spark'. Functionality includes creating all required tables and fields and creation of a single reference to the data. Native 'Spark' functionality is supported.

Author(s)

Maintainer: Edward Burn edward.burn@ndorms.ox.ac.uk (ORCID)

Authors:

See Also

Useful links:


Disconnect the connection of the cdm object

Description

Disconnect the connection of the cdm object

Usage

## S3 method for class 'spark_cdm'
cdmDisconnect(cdm, dropWriteSchema = FALSE, ...)

Arguments

cdm

cdm reference

dropWriteSchema

Whether to drop tables in the writeSchema

...

Not used

Value

Disconnected cdm


Create a cdm_reference object from a sparklyr connection.

Description

Create a cdm_reference object from a sparklyr connection.

Usage

cdmFromSpark(
  con,
  cdmSchema,
  writeSchema,
  cohortTables = NULL,
  cdmVersion = NULL,
  cdmName = NULL,
  achillesSchema = NULL,
  .softValidation = FALSE,
  writePrefix = NULL,
  cdmPrefix = NULL
)

Arguments

con

A spark connection created with: sparklyr::spark_connect().

cdmSchema

Schema where omop standard tables are located. Schema is defined with a named character list/vector; allowed names are: 'catalog', 'schema' and 'prefix'.

writeSchema

Schema where with writing permissions. Schema is defined with a named character list/vector; allowed names are: 'catalog', 'schema' and 'prefix'.

cohortTables

Names of cohort tables to be read from writeSchema.

cdmVersion

The version of the cdm (either "5.3" or "5.4"). If NULL cdm_source$cdm_version will be used instead.

cdmName

The name of the cdm object, if NULL cdm_source$cdm_source_name will be used instead.

achillesSchema

Schema where achilled tables are located. Schema is defined with a named character list/vector; allowed names are: 'catalog', 'schema' and 'prefix'.

.softValidation

Whether to use soft validation, this is not recommended as analysis pipelines assume the cdm fullfill the validation criteria.

writePrefix

A prefix that will be added to all tables created in the write_schema. This can be used to create namespace in your database write_schema for your tables.

cdmPrefix

A prefix used with the OMOP CDM tables.

Value

A cdm reference object


Create OMOP CDM tables

Description

Create OMOP CDM tables

Usage

createOmopTablesOnSpark(
  con,
  schemaName,
  cdmVersion = "5.4",
  overwrite = FALSE,
  bigInt = FALSE,
  cdmPrefix = NULL
)

Arguments

con

Connection to a Spark database.

schemaName

Schema in which to create tables.

cdmVersion

Which version of the OMOP CDM to create. Can be "5.3" or "5.4".

overwrite

Whether to overwrite existing tables.

bigInt

Whether to use big integers for person identifier (person_id or subject_id)

cdmPrefix

Whether to cdmPrefix tables created (not generally recommended).

Value

OMOP CDM tables created in database


Drop spark tables

Description

Drop Spark tables in the write schema of the connection behind the cdm reference.

Usage

## S3 method for class 'spark_cdm'
dropSourceTable(cdm, name)

Arguments

cdm

A cdm reference

name

The names of the tables to drop. Tidyselect statements can be used.

Value

Drops the Spark tables.


Insert a table to a cdm object

Description

Insert a local dataframe into the cdm.

Usage

## S3 method for class 'spark_cdm'
insertTable(cdm, name, table, overwrite = TRUE, temporary = FALSE, ...)

Arguments

cdm

A cdm reference.

name

The name of the table to insert.

table

The table to insert.

overwrite

Whether to overwrite an existing table.

temporary

If TRUE, a spark dataframe will be written (that will persist to the end of the current session). If FALSE, a spark table will be written (which will persist beyond the end of the current session).

...

For compatability

Value

The cdm reference with the table added.


creates a cdm reference to local spark OMOP CDM tables

Description

creates a cdm reference to local spark OMOP CDM tables

Usage

mockSparkCdm(path)

Arguments

path

A directory for files

Value

A cdm reference with synthetic data in a local spark connection

Examples


if(sparklyr::spark_installed_versions() |> nrow() == 0){
folder <- file.path(tempdir(), "temp_spark")
cdm <- mockSparkCdm(path = folder)
cdm
}


Objects exported from other packages

Description

These objects are imported from other packages. Follow the links below to see their documentation.

dplyr

compute

omopgenerics

cdmDisconnect, cdmTableFromSource, dropSourceTable, insertCdmTo, insertTable, listSourceTables, readSourceTable

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.