The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Creating synthetic clinical tables

The omock package provides functionality to quickly create a cdm reference containing synthetic data based on population settings specified by the user.

First, let’s load packages required for this vignette.

library(omock)
library(dplyr)
library(ggplot2)

Now, in three lines of code, we can create a cdm reference with a person and observation period table for 1000 people.

cdm <- emptyCdmReference(cdmName = "synthetic cdm") |>
  mockPerson(nPerson = 1000) |>
  mockObservationPeriod()

cdm
#> 
#> ── # OMOP CDM reference (local) of synthetic cdm ───────────────────────────────
#> • omop tables: person, observation_period
#> • cohort tables: -
#> • achilles tables: -
#> • other tables: -

cdm$person |> glimpse()
#> Rows: 1,000
#> Columns: 18
#> $ person_id                   <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,…
#> $ gender_concept_id           <int> 8507, 8507, 8507, 8532, 8507, 8507, 8507, …
#> $ year_of_birth               <int> 1997, 1969, 1966, 1983, 1976, 1975, 1981, …
#> $ month_of_birth              <int> 8, 2, 10, 7, 6, 9, 6, 8, 10, 11, 4, 7, 4, …
#> $ day_of_birth                <int> 10, 9, 3, 26, 29, 10, 21, 23, 19, 25, 20, …
#> $ race_concept_id             <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ ethnicity_concept_id        <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ birth_datetime              <dttm> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ location_id                 <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ provider_id                 <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ care_site_id                <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ person_source_value         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ gender_source_value         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ gender_source_concept_id    <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ race_source_value           <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ race_source_concept_id      <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ ethnicity_source_value      <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ ethnicity_source_concept_id <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…

cdm$observation_period |> glimpse()
#> Rows: 1,000
#> Columns: 5
#> $ observation_period_id         <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1…
#> $ person_id                     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1…
#> $ observation_period_start_date <date> 2007-10-01, 2015-02-14, 1992-10-13, 199…
#> $ observation_period_end_date   <date> 2008-03-17, 2016-11-28, 2011-10-08, 199…
#> $ period_type_concept_id        <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …

We can add further requirements around the population we create. For example we can require that they were born between 1960 and 1980 like so.

cdm <- emptyCdmReference(cdmName = "synthetic cdm") |>
  mockPerson(
    nPerson = 1000,
    birthRange = as.Date(c("1960-01-01", "1980-12-31"))
  ) |>
  mockObservationPeriod()
cdm$person |>
  collect() |>
  ggplot() +
  geom_histogram(aes(as.integer(year_of_birth)),
    binwidth = 1, colour = "grey"
  ) +
  theme_minimal() +
  xlab("Year of birth")

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.