The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
subsetVocabularyTables() lets you reduce a CDM
vocabulary to a smaller set of concept IDs while keeping the vocabulary
tables internally consistent.
This is useful when you want:
We first create a simple mock CDM with vocabulary tables.
Now we subset the vocabulary to two concept IDs.
cdm_subset <- cdm |>
subsetVocabularyTables(conceptSet = c(8507L, 8532L))
cdm_subset$concept |>
select(concept_id, concept_name, domain_id, vocabulary_id)
#> # A tibble: 2,482 × 4
#> concept_id concept_name domain_id vocabulary_id
#> <int> <chr> <chr> <chr>
#> 1 8507 MALE Gender Gender
#> 2 8521 OTHER Gender Gender
#> 3 8532 FEMALE Gender Gender
#> 4 8551 UNKNOWN Gender Gender
#> 5 8570 AMBIGUOUS Gender Gender
#> 6 4214687 Gender unknown Gender SNOMED
#> 7 4215271 Gender unspecified Gender SNOMED
#> 8 4231242 Surgically transgendered transsexual, fem… Gender SNOMED
#> 9 4234363 Surgically transgendered transsexual Gender SNOMED
#> 10 4251434 Surgically transgendered transsexual, mal… Gender SNOMED
#> # ℹ 2,472 more rowsBy default, subsetVocabularyTables() also keeps:
conceptSetUnit, Visit, and
Gender domainsThis behaviour helps preserve a usable mock CDM after vocabulary subsetting.
You can override the default kept domains with
keepDomains.
cdm_no_defaults <- cdm |>
subsetVocabularyTables(
conceptSet = c(8507L, 8532L),
includeRelated = FALSE,
keepDomains = character(0)
)
cdm_no_defaults$concept |>
select(concept_id, concept_name, domain_id)
#> # A tibble: 2 × 3
#> concept_id concept_name domain_id
#> <int> <chr> <chr>
#> 1 8507 MALE Gender
#> 2 8532 FEMALE GenderThis is useful when you want the smallest possible vocabulary subset.
The function is also useful after creating a CDM with clinical tables. In that case, rows in other OMOP tables that reference removed concepts are also filtered.
cdm_clinical <- mockVocabularyTables() |>
mockPerson(nPerson = 10, seed = 1) |>
mockObservationPeriod(seed = 1) |>
mockConditionOccurrence(seed = 1)
cdm_clinical_small <- cdm_clinical |>
subsetVocabularyTables(conceptSet = c(8507L, 8532L))
cdm_clinical_small$concept |>
tally()
#> # A tibble: 1 × 1
#> n
#> <int>
#> 1 2482If your chosen concept set removes concepts used by clinical tables, the corresponding rows are dropped so the resulting CDM stays consistent.
For example, imagine a condition_occurrence row uses
condition_concept_id = 123, but after subsetting the
vocabulary, concept 123 is no longer present in
cdm$concept. In that case, that
condition_occurrence row is removed as well.
cdm_example <- mockVocabularyTables(
concept = dplyr::tibble(
concept_id = c(1L, 2L, 3L),
concept_name = c("condition a", "condition b", "gender"),
domain_id = c("Condition", "Condition", "Gender"),
vocabulary_id = c("SNOMED", "SNOMED", "Gender"),
standard_concept = "S",
concept_class_id = c("Clinical Finding", "Clinical Finding", "Gender"),
concept_code = "1",
valid_start_date = as.Date(NA),
valid_end_date = as.Date(NA),
invalid_reason = NA_character_
)
) |>
mockCdmFromTables(tables = list(
person = dplyr::tibble(
person_id = c(1L, 2L),
gender_concept_id = c(3L, 3L),
year_of_birth = c(1990L, 1991L)
),
condition_occurrence = dplyr::tibble(
condition_occurrence_id = c(1L, 2L),
person_id = c(1L, 2L),
condition_concept_id = c(1L, 2L),
condition_start_date = as.Date(c("2020-01-01", "2020-01-02")),
condition_end_date = as.Date(c("2020-01-01", "2020-01-02")),
condition_type_concept_id = c(0L, 0L)
)
))
cdm_example_small <- cdm_example |>
subsetVocabularyTables(
conceptSet = 1L,
includeRelated = FALSE,
keepDomains = "Gender"
)
cdm_example_small$concept |>
select(concept_id, domain_id)
#> # A tibble: 2 × 2
#> concept_id domain_id
#> <int> <chr>
#> 1 1 Condition
#> 2 3 Gender
cdm_example_small$condition_occurrence |>
select(person_id, condition_concept_id)
#> # A tibble: 1 × 2
#> person_id condition_concept_id
#> <int> <int>
#> 1 1 1In this example, the row using condition_concept_id = 2
is removed because concept 2 is no longer present after
subsetting.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.