The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
To check the performance of the IncidencePrevalence package we can use the benchmarkIncidencePrevalence(). This function generates some hypothetical study cohorts and the estimates incidence and prevalence using various settings and times how long these analyses take.
We can start for example by benchmarking our example mock data which uses duckdb.
library(IncidencePrevalence)
library(visOmopResults)
library(dplyr)
library(ggplot2)
cdm <- mockIncidencePrevalence(
sampleSize = 100,
earliestObservationStartDate = as.Date("2010-01-01"),
latestObservationStartDate = as.Date("2010-01-01"),
minDaysToObservationEnd = 364,
maxDaysToObservationEnd = 364,
outPre = 0.1
)
timings <- benchmarkIncidencePrevalence(cdm)
timings |>
glimpse()
#> Rows: 4
#> Columns: 13
#> $ result_id <int> 1, 1, 1, 1
#> $ cdm_name <chr> "mock", "mock", "mock", "mock"
#> $ group_name <chr> "task", "task", "task", "task"
#> $ group_level <chr> "generating denominator (8 cohorts)", "yearly point p…
#> $ strata_name <chr> "overall", "overall", "overall", "overall"
#> $ strata_level <chr> "overall", "overall", "overall", "overall"
#> $ variable_name <chr> "overall", "overall", "overall", "overall"
#> $ variable_level <chr> "overall", "overall", "overall", "overall"
#> $ estimate_name <chr> "time_taken_minutes", "time_taken_minutes", "time_tak…
#> $ estimate_type <chr> "numeric", "numeric", "numeric", "numeric"
#> $ estimate_value <chr> "0.13", "0.06", "0.06", "0.17"
#> $ additional_name <chr> "dbms &&& person_n &&& min_observation_start &&& max_…
#> $ additional_level <chr> "duckdb &&& 100 &&& 2010-01-01 &&& 2010-12-31", "duck…
We can see our results like so:
visOmopTable(timings,
hide = c(
"variable_name", "variable_level",
"strata_name", "strata_level"
),
groupColumn = "task"
)
CDM name | Dbms | Person n | Min observation start | Max observation end | Estimate name | Estimate value |
---|---|---|---|---|---|---|
generating denominator (8 cohorts) | ||||||
mock | duckdb | 100 | 2010-01-01 | 2010-12-31 | time_taken_minutes | 0.13 |
yearly point prevalence for two outcomes with eight denominator cohorts | ||||||
mock | duckdb | 100 | 2010-01-01 | 2010-12-31 | time_taken_minutes | 0.06 |
yearly period prevalence for two outcomes with eight denominator cohorts | ||||||
mock | duckdb | 100 | 2010-01-01 | 2010-12-31 | time_taken_minutes | 0.06 |
yearly incidence for two outcomes with eight denominator cohorts | ||||||
mock | duckdb | 100 | 2010-01-01 | 2010-12-31 | time_taken_minutes | 0.17 |
Here we can see the results from the running the benchmark on test datasets on different databases management systems. These benchmarks have already been run so we’ll start by loading the results.
test_db <- IncidencePrevalenceBenchmarkResults
test_db |>
glimpse()
#> Rows: 16
#> Columns: 13
#> $ result_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
#> $ cdm_name <chr> "ohdsi_postgres", "ohdsi_postgres", "ohdsi_postgres",…
#> $ group_name <chr> "task", "task", "task", "task", "task", "task", "task…
#> $ group_level <chr> "generating denominator (8 cohorts)", "yearly point p…
#> $ strata_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_level <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ estimate_name <chr> "time_taken_minutes", "time_taken_minutes", "time_tak…
#> $ estimate_type <chr> "numeric", "numeric", "numeric", "numeric", "numeric"…
#> $ estimate_value <chr> "0.81", "0.23", "0.23", "1.02", "1.2", "0.25", "0.24"…
#> $ additional_name <chr> "dbms &&& person_n &&& min_observation_start &&& max_…
#> $ additional_level <chr> "postgresql &&& 1000 &&& 2008-01-01 &&& 2010-12-31", …
visOmopTable(bind(timings, test_db),
hide = c(
"variable_name", "variable_level",
"strata_name", "strata_level"
),
groupColumn = "task"
)
CDM name | Dbms | Person n | Min observation start | Max observation end | Estimate name | Estimate value |
---|---|---|---|---|---|---|
generating denominator (8 cohorts) | ||||||
mock | duckdb | 100 | 2010-01-01 | 2010-12-31 | time_taken_minutes | 0.13 |
ohdsi_postgres | postgresql | 1000 | 2008-01-01 | 2010-12-31 | time_taken_minutes | 0.81 |
ohdsi_redshift | redshift | 1000 | 2007-12-15 | 2010-12-31 | time_taken_minutes | 1.20 |
ohdsi_sql_Server | sql server | 1000 | 2008-01-01 | 2010-12-31 | time_taken_minutes | 0.55 |
ohdsi_snowflake | snowflake | 116352 | 2007-11-27 | 2010-12-31 | time_taken_minutes | 2.03 |
yearly point prevalence for two outcomes with eight denominator cohorts | ||||||
mock | duckdb | 100 | 2010-01-01 | 2010-12-31 | time_taken_minutes | 0.06 |
ohdsi_postgres | postgresql | 1000 | 2008-01-01 | 2010-12-31 | time_taken_minutes | 0.23 |
ohdsi_redshift | redshift | 1000 | 2007-12-15 | 2010-12-31 | time_taken_minutes | 0.25 |
ohdsi_sql_Server | sql server | 1000 | 2008-01-01 | 2010-12-31 | time_taken_minutes | 0.18 |
ohdsi_snowflake | snowflake | 116352 | 2007-11-27 | 2010-12-31 | time_taken_minutes | 0.50 |
yearly period prevalence for two outcomes with eight denominator cohorts | ||||||
mock | duckdb | 100 | 2010-01-01 | 2010-12-31 | time_taken_minutes | 0.06 |
ohdsi_postgres | postgresql | 1000 | 2008-01-01 | 2010-12-31 | time_taken_minutes | 0.23 |
ohdsi_redshift | redshift | 1000 | 2007-12-15 | 2010-12-31 | time_taken_minutes | 0.24 |
ohdsi_sql_Server | sql server | 1000 | 2008-01-01 | 2010-12-31 | time_taken_minutes | 0.18 |
ohdsi_snowflake | snowflake | 116352 | 2007-11-27 | 2010-12-31 | time_taken_minutes | 0.37 |
yearly incidence for two outcomes with eight denominator cohorts | ||||||
mock | duckdb | 100 | 2010-01-01 | 2010-12-31 | time_taken_minutes | 0.17 |
ohdsi_postgres | postgresql | 1000 | 2008-01-01 | 2010-12-31 | time_taken_minutes | 1.02 |
ohdsi_redshift | redshift | 1000 | 2007-12-15 | 2010-12-31 | time_taken_minutes | 1.38 |
ohdsi_sql_Server | sql server | 1000 | 2008-01-01 | 2010-12-31 | time_taken_minutes | 0.70 |
ohdsi_snowflake | snowflake | 116352 | 2007-11-27 | 2010-12-31 | time_taken_minutes | 2.49 |
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.