The package provides functionalities to tidy a summarised result to obtain a dataframe with which is easier to do subsequent calculations.
In this line, the split
functions, described in
split and unite functions allow to interact with
name-level columns.
For the estimates, we have the pivotEstimates
function,
and for the settings pivotSettings
. Finally the
tidy
method accommodates the split and pivot
functionalities in the same function.
First, let’s load relevant libraries and create a mock summarised result table.
library(visOmopResults)
library(dplyr)
result <- mockSummarisedResult()
result |> glimpse()
#> Rows: 126
#> Columns: 16
#> $ result_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name <chr> "mock", "mock", "mock", "mock", "mock", "mock", "mock…
#> $ result_type <chr> "mock_summarised_result", "mock_summarised_result", "…
#> $ package_name <chr> "visOmopResults", "visOmopResults", "visOmopResults",…
#> $ package_version <chr> "0.2.1", "0.2.1", "0.2.1", "0.2.1", "0.2.1", "0.2.1",…
#> $ group_name <chr> "cohort_name", "cohort_name", "cohort_name", "cohort_…
#> $ group_level <chr> "cohort1", "cohort1", "cohort1", "cohort1", "cohort1"…
#> $ strata_name <chr> "overall", "age_group &&& sex", "age_group &&& sex", …
#> $ strata_level <chr> "overall", "<40 &&& Male", ">=40 &&& Male", "<40 &&& …
#> $ variable_name <chr> "number subjects", "number subjects", "number subject…
#> $ variable_level <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ estimate_name <chr> "count", "count", "count", "count", "count", "count",…
#> $ estimate_type <chr> "integer", "integer", "integer", "integer", "integer"…
#> $ estimate_value <chr> "4899667", "1557137", "230180", "4279376", "341416", …
#> $ additional_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…
The function pivotEstimates
adds columns containing the
estimates values for each combination of columns in
pivotEstimatesBy
. For instance, in the following example we
use the columns variable_name, variable_level, and
estimate_name to pivot the estimates.
result |>
pivotEstimates(pivotEstimatesBy = c("variable_name", "variable_level", "estimate_name")) |>
glimpse()
#> Rows: 18
#> Columns: 18
#> $ result_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name <chr> "mock", "mock", "mock", "mock", "mo…
#> $ result_type <chr> "mock_summarised_result", "mock_sum…
#> $ package_name <chr> "visOmopResults", "visOmopResults",…
#> $ package_version <chr> "0.2.1", "0.2.1", "0.2.1", "0.2.1",…
#> $ group_name <chr> "cohort_name", "cohort_name", "coho…
#> $ group_level <chr> "cohort1", "cohort1", "cohort1", "c…
#> $ strata_name <chr> "overall", "age_group &&& sex", "ag…
#> $ strata_level <chr> "overall", "<40 &&& Male", ">=40 &&…
#> $ additional_name <chr> "overall", "overall", "overall", "o…
#> $ additional_level <chr> "overall", "overall", "overall", "o…
#> $ `number subjects_NA_count` <int> 4899667, 1557137, 230180, 4279376, …
#> $ age_NA_mean <dbl> 45.00919, 66.04876, 82.00216, 59.45…
#> $ age_NA_sd <dbl> 4.906876, 5.706327, 1.251627, 9.679…
#> $ Medications_Amoxiciline_count <int> 34243, 60972, 92885, 14238, 54725, …
#> $ Medications_Amoxiciline_percentage <dbl> 98.175316, 88.166081, 53.749952, 80…
#> $ Medications_Ibuprofen_count <int> 24412, 16138, 78225, 43763, 83941, …
#> $ Medications_Ibuprofen_percentage <dbl> 55.764227, 62.955280, 62.126432, 27…
The argument nameStyle
is to customise the names of the
new columns. It uses the glue package syntax. For instance:
result |>
pivotEstimates(pivotEstimatesBy = "estimate_name",
nameStyle = "{toupper(estimate_name)}") |>
glimpse()
#> Rows: 72
#> Columns: 17
#> $ result_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name <chr> "mock", "mock", "mock", "mock", "mock", "mock", "mock…
#> $ result_type <chr> "mock_summarised_result", "mock_summarised_result", "…
#> $ package_name <chr> "visOmopResults", "visOmopResults", "visOmopResults",…
#> $ package_version <chr> "0.2.1", "0.2.1", "0.2.1", "0.2.1", "0.2.1", "0.2.1",…
#> $ group_name <chr> "cohort_name", "cohort_name", "cohort_name", "cohort_…
#> $ group_level <chr> "cohort1", "cohort1", "cohort1", "cohort1", "cohort1"…
#> $ strata_name <chr> "overall", "age_group &&& sex", "age_group &&& sex", …
#> $ strata_level <chr> "overall", "<40 &&& Male", ">=40 &&& Male", "<40 &&& …
#> $ variable_name <chr> "number subjects", "number subjects", "number subject…
#> $ variable_level <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ additional_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ COUNT <int> 4899667, 1557137, 230180, 4279376, 341416, 6125243, 1…
#> $ MEAN <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ SD <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ PERCENTAGE <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
The function pivotSettings
adds a new column for each of
the settings in the summarised result, if any:
mockSummarisedResult(settings = TRUE) |>
pivotSettings() |>
glimpse()
#> Rows: 126
#> Columns: 17
#> $ result_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name <chr> "mock", "mock", "mock", "mock", "mock", "mock", "mock…
#> $ result_type <chr> "mock_summarised_result", "mock_summarised_result", "…
#> $ package_name <chr> "visOmopResults", "visOmopResults", "visOmopResults",…
#> $ package_version <chr> "0.2.1", "0.2.1", "0.2.1", "0.2.1", "0.2.1", "0.2.1",…
#> $ group_name <chr> "cohort_name", "cohort_name", "cohort_name", "cohort_…
#> $ group_level <chr> "cohort1", "cohort1", "cohort1", "cohort1", "cohort1"…
#> $ strata_name <chr> "overall", "age_group &&& sex", "age_group &&& sex", …
#> $ strata_level <chr> "overall", "<40 &&& Male", ">=40 &&& Male", "<40 &&& …
#> $ variable_name <chr> "number subjects", "number subjects", "number subject…
#> $ variable_level <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ estimate_name <chr> "count", "count", "count", "count", "count", "count",…
#> $ estimate_type <chr> "integer", "integer", "integer", "integer", "integer"…
#> $ estimate_value <chr> "9715278", "8167184", "7595780", "8059089", "7683359"…
#> $ additional_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ mock_default <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,…
The function appendSettings
is the inverse functionality
to pivot. This function will append columns corresponding to settings to
the end summarised result:
table <- mockSummarisedResult() |>
mutate(mockSummarisedResult = TRUE, vignette = "tidy")
result <- table |> appendSettings(colsSettings = c("mockSummarisedResult", "vignette"))
result |> filter(variable_name == "settings") |> glimpse()
#> Rows: 2
#> Columns: 16
#> $ result_id <int> 1, 1
#> $ cdm_name <chr> "mock", "mock"
#> $ result_type <chr> "mock_summarised_result", "mock_summarised_result"
#> $ package_name <chr> "visOmopResults", "visOmopResults"
#> $ package_version <chr> "0.2.1", "0.2.1"
#> $ group_name <chr> "overall", "overall"
#> $ group_level <chr> "overall", "overall"
#> $ strata_name <chr> "overall", "overall"
#> $ strata_level <chr> "overall", "overall"
#> $ variable_name <chr> "settings", "settings"
#> $ variable_level <chr> NA, NA
#> $ estimate_name <chr> "mockSummarisedResult", "vignette"
#> $ estimate_type <chr> "logical", "character"
#> $ estimate_value <chr> "TRUE", "tidy"
#> $ additional_name <chr> "overall", "overall"
#> $ additional_level <chr> "overall", "overall"
Finally, the method tidy
incorporates the splitting pf
name-level columns and pivotting of estimates and settings. By default,
it splits group, strata and additional, pivots estimates by the columns
“estimate_name” and also pivots the settings.
result <- mockSummarisedResult(settings = TRUE)
result |>
tidy() |>
glimpse()
#> Rows: 72
#> Columns: 15
#> $ result_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
#> $ cdm_name <chr> "mock", "mock", "mock", "mock", "mock", "mock", "mock"…
#> $ result_type <chr> "mock_summarised_result", "mock_summarised_result", "m…
#> $ package_name <chr> "visOmopResults", "visOmopResults", "visOmopResults", …
#> $ package_version <chr> "0.2.1", "0.2.1", "0.2.1", "0.2.1", "0.2.1", "0.2.1", …
#> $ cohort_name <chr> "cohort1", "cohort1", "cohort1", "cohort1", "cohort1",…
#> $ age_group <chr> "overall", "<40", ">=40", "<40", ">=40", "overall", "o…
#> $ sex <chr> "overall", "Male", "Male", "Female", "Female", "Male",…
#> $ variable_name <chr> "number subjects", "number subjects", "number subjects…
#> $ variable_level <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ count <int> 819876, 9032951, 1567835, 937791, 1733838, 1055507, 53…
#> $ mean <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ sd <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ percentage <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ mock_default <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, …
Which column pairs to split can be customised with the split
arguments, while pivotEstimatesBy
and
nameStyle
are for pivotting estimates. If
pivotEstimatesBy
is NULL
or
character()
, estimates will not be modified. Settings will
always be pivotted if present.