The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
(This package is NOT part of HADES.)
The idea behind this package is to allow the construction of new cohorts from previously instantiated cohorts in the cohort table. All cohorts in OHDSI have a standard definition: “A cohort is a set of persons who satisfy one or more inclusion criteria for a duration of time.”
This is represented in a cohort table as cohort_definition_id, subject_id, cohort_start_date and cohort_end_date. For more details about the concept of a cohort please review The Book of OHDSI.
This package allows the creation of new cohorts from previously instantiated cohort table using cohort algebra (similar to temporal set algebra). The output is one or more new cohorts.
remotes::install_github("OHDSI/CohortAlgebra")
cohort
#> # A tibble: 3 x 4
#> cohortDefinitionId subjectId cohortStartDate cohortEndDate
#> <dbl> <dbl> <date> <date>
#> 1 1 1 2022-01-01 2022-03-01
#> 2 2 1 2022-02-10 2022-05-10
#> 3 2 1 2022-08-15 2022-12-30
The union of the two cohorts is expected to give us
cohortExpected
#> # A tibble: 2 x 4
#> cohortDefinitionId subjectId cohortStartDate cohortEndDate
#> <dbl> <dbl> <date> <date>
#> 1 3 1 2022-01-01 2022-05-10
#> 2 3 1 2022-08-15 2022-12-30
To perform Cohort Union, we use the unionCohorts function. This function requires as an input a data.frame called oldToNewCohortId. Here we specify the cohort id’s of the cohorts we want to union. The newCohortId is the cohortId of the resultant cohort. The oldCohortId are cohorts that are already in the cohort table.
oldToNewCohortId <-
dplyr::tibble(
oldCohortId = c(1, 2, 2),
newCohortId = c(3, 3, 3)
)
CohortAlgebra::unionCohorts(
connection = connection,
sourceCohortDatabaseSchema = cohortDatabaseSchema,
sourceCohortTable = tableName,
targetCohortDatabaseSchema = cohortDatabaseSchema,
targetCohortTable = tableName,
oldToNewCohortId = oldToNewCohortId
)
Now we will have a new cohortId ‘3’ which is the union of cohortIds 1 and 2.
data
#> # A tibble: 2 x 4
#> cohortDefinitionId subjectId cohortStartDate cohortEndDate
#> <dbl> <dbl> <date> <date>
#> 1 3 1 2022-01-01 2022-05-10
#> 2 3 1 2022-08-15 2022-12-30
Note: if the target cohort table had a cohort with cohortId = 3, before running the union function - this would cause a conflict. In those cases, the union function would not run. We can purge all records for cohortId = 3 from the target cohort table. The parameter purgeConflicts will delete any cohort records in the cohort table where cohortId is the cohortId of the newCohort.
Input:
cohort
#> # A tibble: 2 x 4
#> cohortDefinitionId subjectId cohortStartDate cohortEndDate
#> <dbl> <dbl> <date> <date>
#> 1 1 1 2022-01-01 2022-01-15
#> 2 2 1 2021-12-15 2022-01-30
CohortAlgebra::intersectCohorts(
connection = connection,
sourceCohortDatabaseSchema = cohortDatabaseSchema,
sourceCohortTable = tableName,
targetCohortDatabaseSchema = cohortDatabaseSchema,
targetCohortTable = tableName,
cohortIds = c(1, 2),
newCohortId = 3
)
Output
Input:
cohort
#> # A tibble: 3 x 4
#> cohortDefinitionId subjectId cohortStartDate cohortEndDate
#> <dbl> <dbl> <date> <date>
#> 1 1 1 2022-01-01 2022-01-15
#> 2 2 1 2021-12-15 2022-01-05
#> 3 2 1 2022-01-10 2022-01-30
CohortAlgebra::intersectCohorts(
connection = connection,
sourceCohortDatabaseSchema = cohortDatabaseSchema,
sourceCohortTable = tableName,
targetCohortDatabaseSchema = cohortDatabaseSchema,
targetCohortTable = tableName,
cohortIds = c(1, 2),
newCohortId = 3
)
Output
Input:
cohort
#> # A tibble: 3 x 4
#> cohortDefinitionId subjectId cohortStartDate cohortEndDate
#> <dbl> <dbl> <date> <date>
#> 1 1 1 2022-01-01 2022-01-15
#> 2 2 1 2021-12-15 2022-01-30
#> 3 3 1 2022-03-01 2022-03-15
CohortAlgebra::intersectCohorts(
connection = connection,
sourceCohortDatabaseSchema = cohortDatabaseSchema,
sourceCohortTable = tableName,
targetCohortDatabaseSchema = cohortDatabaseSchema,
targetCohortTable = tableName,
cohortIds = c(1, 2, 3),
newCohortId = 4
)
Output
Input:
cohort
#> # A tibble: 3 x 4
#> cohortDefinitionId subjectId cohortStartDate cohortEndDate
#> <dbl> <dbl> <date> <date>
#> 1 1 1 2022-01-01 2022-01-15
#> 2 2 1 2021-12-15 2022-01-30
#> 3 3 1 2022-03-01 2022-03-15
CohortAlgebra::intersectCohorts(
connection = connection,
sourceCohortDatabaseSchema = cohortDatabaseSchema,
sourceCohortTable = tableName,
targetCohortDatabaseSchema = cohortDatabaseSchema,
targetCohortTable = tableName,
cohortIds = c(1, 2, 3),
newCohortId = 4
)
Output
Input:
cohort
#> # A tibble: 2 x 4
#> cohortDefinitionId subjectId cohortStartDate cohortEndDate
#> <dbl> <dbl> <date> <date>
#> 1 1 1 2022-01-01 2022-01-01
#> 2 2 1 2022-01-01 2022-01-02
Output
Input:
cohort
#> # A tibble: 2 x 4
#> cohortDefinitionId subjectId cohortStartDate cohortEndDate
#> <dbl> <dbl> <date> <date>
#> 1 1 1 2022-01-01 2022-03-01
#> 2 2 1 2022-02-10 2022-05-10
CohortAlgebra::minusCohorts(
connection = connection,
sourceCohortDatabaseSchema = cohortDatabaseSchema,
sourceCohortTable = tableName,
targetCohortDatabaseSchema = cohortDatabaseSchema,
targetCohortTable = tableName,
firstCohortId = 1,
secondCohortId = 2,
newCohortId = 3
)
Output for example 1
But if the cohorts are switched, i.e. minus cohort 1 from Cohort 2
CohortAlgebra::minusCohorts(
connection = connection,
sourceCohortDatabaseSchema = cohortDatabaseSchema,
sourceCohortTable = tableName,
targetCohortDatabaseSchema = cohortDatabaseSchema,
targetCohortTable = tableName,
firstCohortId = 2,
secondCohortId = 1,
newCohortId = 4
)
Output
Sequence of cohorts are important for minusCohort
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.