The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
This vignette introduces the following functions from the PHEindicatormethods package and provides basic sample code to demonstrate their execution. The code included is based on the code provided within the ‘examples’ section of the function documentation. This vignette does not explain the methods applied in detail but these can (optionally) be output alongside the statistics or for a more detailed explanation, please see the references section of the function documentation.
library(PHEindicatormethods)
library(dplyr)
This vignette covers the following functions available within the first release of the package (v1.0.8) but has been updated to apply to these functions in their latest release versions. If further functions are added to the package in future releases these will be explained elsewhere.
Function | Type | Description |
---|---|---|
phe_proportion | Non-aggregate | Performs a calculation on each row of data (unless data is grouped) |
phe_rate | Non-aggregate | Performs a calculation on each row of data (unless data is grouped) |
phe_mean | Aggregate | Performs a calculation on each grouping set |
phe_dsr | Aggregate, standardised | Performs a calculation on each grouping set and requires additional reference inputs |
calculate_ISRatio | Aggregate, standardised | Performs a calculation on each grouping set and requires additional reference inputs |
calculate_ISRate | Aggregate, standardised | Performs a calculation on each grouping set and requires additional reference inputs |
The following code chunk creates a data frame containing observed number of events and populations for 4 geographical areas over 2 time periods that is used later to demonstrate the PHEindicatormethods package functions:
<- data.frame(
df area = rep(c("Area1","Area2","Area3","Area4"), 2),
year = rep(2015:2016, each = 4),
obs = sample(100, 2 * 4, replace = TRUE),
pop = sample(100:200, 2 * 4, replace = TRUE))
df#> area year obs pop
#> 1 Area1 2015 77 196
#> 2 Area2 2015 3 140
#> 3 Area3 2015 12 173
#> 4 Area4 2015 5 183
#> 5 Area1 2016 70 195
#> 6 Area2 2016 65 148
#> 7 Area3 2016 54 168
#> 8 Area4 2016 16 188
INPUT: The phe_proportion and phe_rate functions take a single data frame as input with columns representing the numerators and denominators for the statistic. Any other columns present will be retained in the output.
OUTPUT: The functions output the original data frame with additional columns appended. By default the additional columns are the proportion or rate, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.
OPTIONS: The functions also accept additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.
Here are some example code chunks to demonstrate these two functions and the arguments that can optionally be specified
# default proportion
phe_proportion(df, obs, pop)
#> area year obs pop value lowercl uppercl confidence
#> 1 Area1 2015 77 196 0.39285714 0.327173007 0.46266039 95%
#> 2 Area2 2015 3 140 0.02142857 0.007314003 0.06110479 95%
#> 3 Area3 2015 12 173 0.06936416 0.040121258 0.11731614 95%
#> 4 Area4 2015 5 183 0.02732240 0.011725747 0.06235556 95%
#> 5 Area1 2016 70 195 0.35897436 0.294968065 0.42842966 95%
#> 6 Area2 2016 65 148 0.43918919 0.361774515 0.51968079 95%
#> 7 Area3 2016 54 168 0.32142857 0.255479343 0.39536161 95%
#> 8 Area4 2016 16 188 0.08510638 0.053063732 0.13376480 95%
#> statistic method
#> 1 proportion of 1 Wilson
#> 2 proportion of 1 Wilson
#> 3 proportion of 1 Wilson
#> 4 proportion of 1 Wilson
#> 5 proportion of 1 Wilson
#> 6 proportion of 1 Wilson
#> 7 proportion of 1 Wilson
#> 8 proportion of 1 Wilson
# specify confidence level for proportion
phe_proportion(df, obs, pop, confidence=99.8)
#> area year obs pop value lowercl uppercl confidence
#> 1 Area1 2015 77 196 0.39285714 0.292449398 0.50322029 99.8%
#> 2 Area2 2015 3 140 0.02142857 0.004313174 0.09966265 99.8%
#> 3 Area3 2015 12 173 0.06936416 0.029566868 0.15421632 99.8%
#> 4 Area4 2015 5 183 0.02732240 0.007549329 0.09398057 99.8%
#> 5 Area1 2016 70 195 0.35897436 0.261701252 0.46941522 99.8%
#> 6 Area2 2016 65 148 0.43918919 0.320634897 0.56511532 99.8%
#> 7 Area3 2016 54 168 0.32142857 0.222297357 0.43976878 99.8%
#> 8 Area4 2016 16 188 0.08510638 0.040616613 0.16970803 99.8%
#> statistic method
#> 1 proportion of 1 Wilson
#> 2 proportion of 1 Wilson
#> 3 proportion of 1 Wilson
#> 4 proportion of 1 Wilson
#> 5 proportion of 1 Wilson
#> 6 proportion of 1 Wilson
#> 7 proportion of 1 Wilson
#> 8 proportion of 1 Wilson
# specify to output proportions as percentages
phe_proportion(df, obs, pop, multiplier=100)
#> area year obs pop value lowercl uppercl confidence statistic
#> 1 Area1 2015 77 196 39.285714 32.7173007 46.266039 95% percentage
#> 2 Area2 2015 3 140 2.142857 0.7314003 6.110479 95% percentage
#> 3 Area3 2015 12 173 6.936416 4.0121258 11.731614 95% percentage
#> 4 Area4 2015 5 183 2.732240 1.1725747 6.235556 95% percentage
#> 5 Area1 2016 70 195 35.897436 29.4968065 42.842966 95% percentage
#> 6 Area2 2016 65 148 43.918919 36.1774515 51.968079 95% percentage
#> 7 Area3 2016 54 168 32.142857 25.5479343 39.536161 95% percentage
#> 8 Area4 2016 16 188 8.510638 5.3063732 13.376480 95% percentage
#> method
#> 1 Wilson
#> 2 Wilson
#> 3 Wilson
#> 4 Wilson
#> 5 Wilson
#> 6 Wilson
#> 7 Wilson
#> 8 Wilson
# specify level of detail to output for proportion
phe_proportion(df, obs, pop, confidence=99.8, multiplier=100)
#> area year obs pop value lowercl uppercl confidence statistic
#> 1 Area1 2015 77 196 39.285714 29.2449398 50.322029 99.8% percentage
#> 2 Area2 2015 3 140 2.142857 0.4313174 9.966265 99.8% percentage
#> 3 Area3 2015 12 173 6.936416 2.9566868 15.421632 99.8% percentage
#> 4 Area4 2015 5 183 2.732240 0.7549329 9.398057 99.8% percentage
#> 5 Area1 2016 70 195 35.897436 26.1701252 46.941522 99.8% percentage
#> 6 Area2 2016 65 148 43.918919 32.0634897 56.511532 99.8% percentage
#> 7 Area3 2016 54 168 32.142857 22.2297357 43.976878 99.8% percentage
#> 8 Area4 2016 16 188 8.510638 4.0616613 16.970803 99.8% percentage
#> method
#> 1 Wilson
#> 2 Wilson
#> 3 Wilson
#> 4 Wilson
#> 5 Wilson
#> 6 Wilson
#> 7 Wilson
#> 8 Wilson
# specify level of detail to output for proportion and remove metadata columns
phe_proportion(df, obs, pop, confidence=99.8, multiplier=100, type="standard")
#> area year obs pop value lowercl uppercl
#> 1 Area1 2015 77 196 39.285714 29.2449398 50.322029
#> 2 Area2 2015 3 140 2.142857 0.4313174 9.966265
#> 3 Area3 2015 12 173 6.936416 2.9566868 15.421632
#> 4 Area4 2015 5 183 2.732240 0.7549329 9.398057
#> 5 Area1 2016 70 195 35.897436 26.1701252 46.941522
#> 6 Area2 2016 65 148 43.918919 32.0634897 56.511532
#> 7 Area3 2016 54 168 32.142857 22.2297357 43.976878
#> 8 Area4 2016 16 188 8.510638 4.0616613 16.970803
# default rate
phe_rate(df, obs, pop)
#> area year obs pop value lowercl uppercl confidence statistic
#> 1 Area1 2015 77 196 39285.714 31002.5473 49101.036 95% rate per 100000
#> 2 Area2 2015 3 140 2142.857 441.9087 6262.338 95% rate per 100000
#> 3 Area3 2015 12 173 6936.416 3580.0635 12117.259 95% rate per 100000
#> 4 Area4 2015 5 183 2732.240 887.1510 6376.138 95% rate per 100000
#> 5 Area1 2016 70 195 35897.436 27982.6159 45354.905 95% rate per 100000
#> 6 Area2 2016 65 148 43918.919 33894.0458 55979.148 95% rate per 100000
#> 7 Area3 2016 54 168 32142.857 24145.0777 41940.269 95% rate per 100000
#> 8 Area4 2016 16 188 8510.638 4861.4409 13821.510 95% rate per 100000
#> method
#> 1 Byars
#> 2 Exact
#> 3 Byars
#> 4 Exact
#> 5 Byars
#> 6 Byars
#> 7 Byars
#> 8 Byars
# specify rate parameters
phe_rate(df, obs, pop, confidence=99.8, multiplier=100)
#> area year obs pop value lowercl uppercl confidence statistic
#> 1 Area1 2015 77 196 39.285714 26.8789289 55.195935 99.8% rate per 100
#> 2 Area2 2015 3 140 2.142857 0.1360953 9.330172 99.8% rate per 100
#> 3 Area3 2015 12 173 6.936416 2.3123561 15.653734 99.8% rate per 100
#> 4 Area4 2015 5 183 2.732240 0.4040283 8.991664 99.8% rate per 100
#> 5 Area1 2016 70 195 35.897436 24.0728005 51.247396 99.8% rate per 100
#> 6 Area2 2016 65 148 43.918919 28.9731496 63.513710 99.8% rate per 100
#> 7 Area3 2016 54 168 32.142857 20.2857877 48.105200 99.8% rate per 100
#> 8 Area4 2016 16 188 8.510638 3.3866809 17.378534 99.8% rate per 100
#> method
#> 1 Byars
#> 2 Exact
#> 3 Byars
#> 4 Exact
#> 5 Byars
#> 6 Byars
#> 7 Byars
#> 8 Byars
# specify rate parameters and reduce columns output and remove metadata columns
phe_rate(df, obs, pop, type="standard", confidence=99.8, multiplier=100)
#> area year obs pop value lowercl uppercl
#> 1 Area1 2015 77 196 39.285714 26.8789289 55.195935
#> 2 Area2 2015 3 140 2.142857 0.1360953 9.330172
#> 3 Area3 2015 12 173 6.936416 2.3123561 15.653734
#> 4 Area4 2015 5 183 2.732240 0.4040283 8.991664
#> 5 Area1 2016 70 195 35.897436 24.0728005 51.247396
#> 6 Area2 2016 65 148 43.918919 28.9731496 63.513710
#> 7 Area3 2016 54 168 32.142857 20.2857877 48.105200
#> 8 Area4 2016 16 188 8.510638 3.3866809 17.378534
These functions can also return aggregate data if the input dataframes are grouped:
# default proportion - grouped
%>%
df group_by(year) %>%
phe_proportion(obs, pop)
#> # A tibble: 2 × 9
#> # Groups: year [2]
#> year obs pop value lowercl uppercl confidence statistic method
#> <int> <int> <int> <dbl> <dbl> <dbl> <chr> <chr> <chr>
#> 1 2015 97 692 0.140 0.116 0.168 95% proportion of 1 Wilson
#> 2 2016 205 699 0.293 0.261 0.328 95% proportion of 1 Wilson
# default rate - grouped
%>%
df group_by(year) %>%
phe_rate(obs, pop)
#> # A tibble: 2 × 9
#> # Groups: year [2]
#> year obs pop value lowercl uppercl confidence statistic method
#> <int> <int> <int> <dbl> <dbl> <dbl> <chr> <chr> <chr>
#> 1 2015 97 692 14017. 11367. 17100. 95% rate per 100000 Byars
#> 2 2016 205 699 29328. 25450. 33629. 95% rate per 100000 Byars
The remaining functions aggregate the rows in the input data frame to produce a single statistic. It is also possible to calculate multiple statistics in a single execution of these functions if the input data frame is grouped - for example by indicator ID, geographic area or time period (or all three). The output contains only the grouping variables and the values calculated by the function - any additional unused columns provided in the input data frame will not be retained in the output.
The df test data generated earlier can be used to demonstrate phe_mean:
INPUT: The phe_mean function take a single data frame as input with a column representing the numbers to be averaged.
OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values (if applicable), the mean, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.
OPTIONS: The function also accepts additional arguments to specify the level of confidence and a reduced level of detail to be output.
Here are some example code chunks to demonstrate the phe_mean function and the arguments that can optionally be specified
# default mean
phe_mean(df,obs)
#> value_sum value_count stdev value lowercl uppercl confidence statistic
#> 1 302 8 31.63068 37.75 11.30609 64.19391 95% mean
#> method
#> 1 Student's t-distribution
# multiple means in a single execution with 99.8% confidence
%>%
df group_by(year) %>%
phe_mean(obs, confidence=0.998)
#> # A tibble: 2 × 10
#> # Groups: year [2]
#> year value_sum value_count stdev value lowercl uppercl confidence statistic
#> <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 2015 97 4 35.4 24.2 -156. 205. 99.8% mean
#> 2 2016 205 4 24.4 51.2 -73.5 176. 99.8% mean
#> # ℹ 1 more variable: method <chr>
# multiple means in a single execution with 99.8% confidence and data-only output
%>%
df group_by(year) %>%
phe_mean(obs, type = "standard", confidence=0.998)
#> # A tibble: 2 × 7
#> # Groups: year [2]
#> year value_sum value_count stdev value lowercl uppercl
#> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 2015 97 4 35.4 24.2 -156. 205.
#> 2 2016 205 4 24.4 51.2 -73.5 176.
The following code chunk creates a data frame containing observed number of events and populations by age band for 4 areas, 5 time periods and 2 sexes:
<- data.frame(
df_std area = rep(c("Area1", "Area2", "Area3", "Area4"), each = 19 * 2 * 5),
year = rep(2006:2010, each = 19 * 2),
sex = rep(rep(c("Male", "Female"), each = 19), 5),
ageband = rep(c(0, 5,10,15,20,25,30,35,40,45,
50,55,60,65,70,75,80,85,90), times = 10),
obs = sample(200, 19 * 2 * 5 * 4, replace = TRUE),
pop = sample(10000:20000, 19 * 2 * 5 * 4, replace = TRUE))
head(df_std)
#> area year sex ageband obs pop
#> 1 Area1 2006 Male 0 44 15092
#> 2 Area1 2006 Male 5 15 10418
#> 3 Area1 2006 Male 10 187 13220
#> 4 Area1 2006 Male 15 89 11665
#> 5 Area1 2006 Male 20 21 15458
#> 6 Area1 2006 Male 25 139 14885
INPUT: The minimum input requirement for the phe_dsr function is a single data frame with columns representing the numerators and denominators for each standardisation category. This is sufficient if the data is:
The 2013 European Standard Population is provided within the package in vector form (esp2013) and is used by default by this function. Alternative standard populations can be used but must be provided by the user. When the function joins a standard population vector to the input data frame it does this by position so it is important that the data is sorted accordingly. This is a user responsibility.
The function can also accept standard populations provided as a column within the input data frame.
standard populations provided as a vector - the vector and the input data frame must both contain rows for the same standardisation categories, and both must be sorted, within each grouping set, by these standardisation categories in the same order
standard populations provided as a column within the input data frame - the standard populations can be appended to the input data frame by the user prior to execution of the function - if the data is grouped to generate multiple dsrs then the standard populations will need to be repeated and appended to the data rows for every grouping set.
OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values, the total count, the total population, the dsr, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.
OPTIONS: If standard populations are being provided as a column within the input data frame then the user must specify this using the stdpoptype argument as the function expects a vector by default. The function also accepts additional arguments to specify the standard populations, the level of confidence, the multiplier and a reduced level of detail to be output.
Here are some example code chunks to demonstrate the phe_dsr function and the arguments that can optionally be specified
# calculate separate dsrs for each area, year and sex
%>%
df_std group_by(area, year, sex) %>%
phe_dsr(obs, pop)
#> # A tibble: 40 × 11
#> # Groups: area, year, sex [40]
#> area year sex total_count total_pop value lowercl uppercl confidence
#> <chr> <int> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
#> 1 Area1 2006 Female 1614 278486 632. 600. 666. 95%
#> 2 Area1 2006 Male 1637 263156 608. 577. 641. 95%
#> 3 Area1 2007 Female 1584 287686 554. 525. 584. 95%
#> 4 Area1 2007 Male 1975 286970 723. 688. 758. 95%
#> 5 Area1 2008 Female 2166 271994 827. 790. 864. 95%
#> 6 Area1 2008 Male 1690 297058 584. 554. 615. 95%
#> 7 Area1 2009 Female 1674 299153 653. 620. 687. 95%
#> 8 Area1 2009 Male 2049 279695 784. 748. 822. 95%
#> 9 Area1 2010 Female 2141 267544 782. 746. 819. 95%
#> 10 Area1 2010 Male 1435 280798 554. 524. 585. 95%
#> # ℹ 30 more rows
#> # ℹ 2 more variables: statistic <chr>, method <chr>
# calculate separate dsrs for each area, year and sex and drop metadata fields from output
%>%
df_std group_by(area, year, sex) %>%
phe_dsr(obs, pop, type="standard")
#> # A tibble: 40 × 8
#> # Groups: area, year, sex [40]
#> area year sex total_count total_pop value lowercl uppercl
#> <chr> <int> <chr> <int> <int> <dbl> <dbl> <dbl>
#> 1 Area1 2006 Female 1614 278486 632. 600. 666.
#> 2 Area1 2006 Male 1637 263156 608. 577. 641.
#> 3 Area1 2007 Female 1584 287686 554. 525. 584.
#> 4 Area1 2007 Male 1975 286970 723. 688. 758.
#> 5 Area1 2008 Female 2166 271994 827. 790. 864.
#> 6 Area1 2008 Male 1690 297058 584. 554. 615.
#> 7 Area1 2009 Female 1674 299153 653. 620. 687.
#> 8 Area1 2009 Male 2049 279695 784. 748. 822.
#> 9 Area1 2010 Female 2141 267544 782. 746. 819.
#> 10 Area1 2010 Male 1435 280798 554. 524. 585.
#> # ℹ 30 more rows
# calculate same specifying standard population in vector form
%>%
df_std group_by(area, year, sex) %>%
phe_dsr(obs, pop, stdpop = esp2013)
#> # A tibble: 40 × 11
#> # Groups: area, year, sex [40]
#> area year sex total_count total_pop value lowercl uppercl confidence
#> <chr> <int> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
#> 1 Area1 2006 Female 1614 278486 632. 600. 666. 95%
#> 2 Area1 2006 Male 1637 263156 608. 577. 641. 95%
#> 3 Area1 2007 Female 1584 287686 554. 525. 584. 95%
#> 4 Area1 2007 Male 1975 286970 723. 688. 758. 95%
#> 5 Area1 2008 Female 2166 271994 827. 790. 864. 95%
#> 6 Area1 2008 Male 1690 297058 584. 554. 615. 95%
#> 7 Area1 2009 Female 1674 299153 653. 620. 687. 95%
#> 8 Area1 2009 Male 2049 279695 784. 748. 822. 95%
#> 9 Area1 2010 Female 2141 267544 782. 746. 819. 95%
#> 10 Area1 2010 Male 1435 280798 554. 524. 585. 95%
#> # ℹ 30 more rows
#> # ℹ 2 more variables: statistic <chr>, method <chr>
# calculate the same dsrs by appending the standard populations to the data frame
%>%
df_std mutate(refpop = rep(esp2013,40)) %>%
group_by(area, year, sex) %>%
phe_dsr(obs,pop, stdpop=refpop, stdpoptype="field")
#> # A tibble: 40 × 11
#> # Groups: area, year, sex [40]
#> area year sex total_count total_pop value lowercl uppercl confidence
#> <chr> <int> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
#> 1 Area1 2006 Female 1614 278486 632. 600. 666. 95%
#> 2 Area1 2006 Male 1637 263156 608. 577. 641. 95%
#> 3 Area1 2007 Female 1584 287686 554. 525. 584. 95%
#> 4 Area1 2007 Male 1975 286970 723. 688. 758. 95%
#> 5 Area1 2008 Female 2166 271994 827. 790. 864. 95%
#> 6 Area1 2008 Male 1690 297058 584. 554. 615. 95%
#> 7 Area1 2009 Female 1674 299153 653. 620. 687. 95%
#> 8 Area1 2009 Male 2049 279695 784. 748. 822. 95%
#> 9 Area1 2010 Female 2141 267544 782. 746. 819. 95%
#> 10 Area1 2010 Male 1435 280798 554. 524. 585. 95%
#> # ℹ 30 more rows
#> # ℹ 2 more variables: statistic <chr>, method <chr>
# calculate for under 75s by filtering out records for 75+ from input data frame and standard population
%>%
df_std filter(ageband <= 70) %>%
group_by(area, year, sex) %>%
phe_dsr(obs, pop, stdpop = esp2013[1:15])
#> # A tibble: 40 × 11
#> # Groups: area, year, sex [40]
#> area year sex total_count total_pop value lowercl uppercl confidence
#> <chr> <int> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
#> 1 Area1 2006 Female 1306 224154 618. 584. 654. 95%
#> 2 Area1 2006 Male 1182 207056 585. 552. 620. 95%
#> 3 Area1 2007 Female 1191 225394 556. 524. 588. 95%
#> 4 Area1 2007 Male 1590 222293 748. 710. 786. 95%
#> 5 Area1 2008 Female 1791 216099 848. 809. 889. 95%
#> 6 Area1 2008 Male 1287 226874 577. 545. 611. 95%
#> 7 Area1 2009 Female 1449 239270 672. 636. 709. 95%
#> 8 Area1 2009 Male 1515 226984 757. 718. 797. 95%
#> 9 Area1 2010 Female 1603 209767 790. 751. 830. 95%
#> 10 Area1 2010 Male 1294 220315 590. 558. 624. 95%
#> # ℹ 30 more rows
#> # ℹ 2 more variables: statistic <chr>, method <chr>
# calculate separate dsrs for persons for each area and year)
%>%
df_std group_by(area, year, ageband) %>%
summarise(obs = sum(obs),
pop = sum(pop),
.groups = "drop_last") %>%
phe_dsr(obs,pop)
#> # A tibble: 20 × 10
#> # Groups: area, year [20]
#> area year total_count total_pop value lowercl uppercl confidence statistic
#> <chr> <int> <int> <int> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 Area1 2006 3251 541642 602. 581. 624. 95% dsr per 1…
#> 2 Area1 2007 3559 574656 635. 612. 657. 95% dsr per 1…
#> 3 Area1 2008 3856 569052 698. 674. 721. 95% dsr per 1…
#> 4 Area1 2009 3723 578848 668. 646. 691. 95% dsr per 1…
#> 5 Area1 2010 3576 548342 665. 642. 689. 95% dsr per 1…
#> 6 Area2 2006 3359 580193 572. 552. 594. 95% dsr per 1…
#> 7 Area2 2007 4308 552969 759. 735. 784. 95% dsr per 1…
#> 8 Area2 2008 4134 556799 759. 734. 784. 95% dsr per 1…
#> 9 Area2 2009 3310 545631 651. 628. 674. 95% dsr per 1…
#> 10 Area2 2010 3935 562442 714. 690. 738. 95% dsr per 1…
#> 11 Area3 2006 4071 519900 789. 763. 815. 95% dsr per 1…
#> 12 Area3 2007 3673 555876 709. 685. 734. 95% dsr per 1…
#> 13 Area3 2008 4390 580990 756. 732. 781. 95% dsr per 1…
#> 14 Area3 2009 3604 554507 649. 627. 671. 95% dsr per 1…
#> 15 Area3 2010 3357 546450 669. 646. 693. 95% dsr per 1…
#> 16 Area4 2006 4103 607716 694. 672. 717. 95% dsr per 1…
#> 17 Area4 2007 3673 602685 632. 611. 654. 95% dsr per 1…
#> 18 Area4 2008 4003 564326 693. 671. 717. 95% dsr per 1…
#> 19 Area4 2009 3056 552974 563. 541. 584. 95% dsr per 1…
#> 20 Area4 2010 3525 578959 590. 570. 611. 95% dsr per 1…
#> # ℹ 1 more variable: method <chr>
INPUT: Unlike the phe_dsr function, there is no default standard or reference data for the calculate_ISRatio and calculate_ISRate functions. These functions take a single data frame as input, with columns representing the numerators and denominators for each standardisation category, plus reference numerators and denominators for each standardisation category.
The reference data can either be provided in a separate data frame/vectors or as columns within the input data frame:
reference data provided as a data frame or as vectors - the data frame/vectors and the input data frame must both contain rows for the same standardisation categories, and both must be sorted, within each grouping set, by these standardisation categories in the same order.
reference data provided as columns within the input data frame - the reference numerators and denominators can be appended to the input data frame prior to execution of the function - if the data is grouped to generate multiple indirectly standardised rates or ratios then the reference data will need to be repeated and appended to the data rows for every grouping set.
OUTPUT: By default, the functions output one row per grouping set containing the grouping variable values, the observed and expected counts, the reference rate (ISRate only), the indirectly standardised rate or ratio, the lower 95% confidence limit, and the upper 95% confidence limit, the confidence level, the statistic name and the method.
OPTIONS: If reference data are being provided as columns within the input data frame then the user must specify this as the function expects vectors by default. The function also accepts additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.
The following code chunk creates a data frame containing the reference data - this example uses the all area data for persons in the baseline year:
<- df_std %>%
df_ref filter(year == 2006) %>%
group_by(ageband) %>%
summarise(obs = sum(obs),
pop = sum(pop),
.groups = "drop_last")
head(df_ref)
#> # A tibble: 6 × 3
#> ageband obs pop
#> <dbl> <int> <int>
#> 1 0 1008 130955
#> 2 5 703 125062
#> 3 10 758 115161
#> 4 15 848 113157
#> 5 20 656 119685
#> 6 25 844 105940
Here are some example code chunks to demonstrate the calculate_ISRatio function and the arguments that can optionally be specified
# calculate separate smrs for each area, year and sex
# standardised against the all-year, all-sex, all-area reference data
%>%
df_std group_by(area, year, sex) %>%
calculate_ISRatio(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 × 11
#> # Groups: area, year, sex [40]
#> area year sex observed expected value lowercl uppercl confidence
#> <chr> <int> <chr> <int> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 Area1 2006 Female 1614 1836. 0.879 0.837 0.923 95%
#> 2 Area1 2006 Male 1637 1761. 0.929 0.885 0.976 95%
#> 3 Area1 2007 Female 1584 1919. 0.825 0.785 0.867 95%
#> 4 Area1 2007 Male 1975 1911. 1.03 0.989 1.08 95%
#> 5 Area1 2008 Female 2166 1764. 1.23 1.18 1.28 95%
#> 6 Area1 2008 Male 1690 1968. 0.859 0.818 0.901 95%
#> 7 Area1 2009 Female 1674 1995. 0.839 0.800 0.880 95%
#> 8 Area1 2009 Male 2049 1848. 1.11 1.06 1.16 95%
#> 9 Area1 2010 Female 2141 1780. 1.20 1.15 1.25 95%
#> 10 Area1 2010 Male 1435 1857. 0.773 0.733 0.814 95%
#> # ℹ 30 more rows
#> # ℹ 2 more variables: statistic <chr>, method <chr>
# calculate the same smrs by appending the reference data to the data frame
# and drop metadata columns from output
%>%
df_std mutate(refobs = rep(df_ref$obs,40),
refpop = rep(df_ref$pop,40)) %>%
group_by(area, year, sex) %>%
calculate_ISRatio(obs, pop, refobs, refpop, refpoptype="field",
type = "standard")
#> # A tibble: 40 × 8
#> # Groups: area, year, sex [40]
#> area year sex observed expected value lowercl uppercl
#> <chr> <int> <chr> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 Area1 2006 Female 1614 1836. 0.879 0.837 0.923
#> 2 Area1 2006 Male 1637 1761. 0.929 0.885 0.976
#> 3 Area1 2007 Female 1584 1919. 0.825 0.785 0.867
#> 4 Area1 2007 Male 1975 1911. 1.03 0.989 1.08
#> 5 Area1 2008 Female 2166 1764. 1.23 1.18 1.28
#> 6 Area1 2008 Male 1690 1968. 0.859 0.818 0.901
#> 7 Area1 2009 Female 1674 1995. 0.839 0.800 0.880
#> 8 Area1 2009 Male 2049 1848. 1.11 1.06 1.16
#> 9 Area1 2010 Female 2141 1780. 1.20 1.15 1.25
#> 10 Area1 2010 Male 1435 1857. 0.773 0.733 0.814
#> # ℹ 30 more rows
The calculate_ISRate function works exactly the same way but instead of expressing the result as a ratio of the observed and expected rates the result is expressed as a rate and the reference rate is also provided. Here are some examples:
# calculate separate indirectly standardised rates for each area, year and sex
# standardised against the all-year, all-sex, all-area reference data
%>%
df_std group_by(area, year, sex) %>%
calculate_ISRate(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 × 12
#> # Groups: area, year, sex [40]
#> area year sex observed expected ref_rate value lowercl uppercl confidence
#> <chr> <int> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 Area1 2006 Fema… 1614 1836. 657. 578. 550. 607. 95%
#> 2 Area1 2006 Male 1637 1761. 657. 611. 582. 641. 95%
#> 3 Area1 2007 Fema… 1584 1919. 657. 542. 516. 570. 95%
#> 4 Area1 2007 Male 1975 1911. 657. 679. 650. 710. 95%
#> 5 Area1 2008 Fema… 2166 1764. 657. 807. 773. 842. 95%
#> 6 Area1 2008 Male 1690 1968. 657. 564. 538. 592. 95%
#> 7 Area1 2009 Fema… 1674 1995. 657. 552. 525. 579. 95%
#> 8 Area1 2009 Male 2049 1848. 657. 729. 698. 761. 95%
#> 9 Area1 2010 Fema… 2141 1780. 657. 791. 757. 825. 95%
#> 10 Area1 2010 Male 1435 1857. 657. 508. 482. 535. 95%
#> # ℹ 30 more rows
#> # ℹ 2 more variables: statistic <chr>, method <chr>
# calculate the same indirectly standardised rates by appending the reference data to the data frame
# and drop metadata columns from output
%>%
df_std mutate(refobs = rep(df_ref$obs,40),
refpop = rep(df_ref$pop,40)) %>%
group_by(area, year, sex) %>%
calculate_ISRate(obs, pop, refobs, refpop, refpoptype="field",
type = "standard")
#> # A tibble: 40 × 9
#> # Groups: area, year, sex [40]
#> area year sex observed expected ref_rate value lowercl uppercl
#> <chr> <int> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Area1 2006 Female 1614 1836. 657. 578. 550. 607.
#> 2 Area1 2006 Male 1637 1761. 657. 611. 582. 641.
#> 3 Area1 2007 Female 1584 1919. 657. 542. 516. 570.
#> 4 Area1 2007 Male 1975 1911. 657. 679. 650. 710.
#> 5 Area1 2008 Female 2166 1764. 657. 807. 773. 842.
#> 6 Area1 2008 Male 1690 1968. 657. 564. 538. 592.
#> 7 Area1 2009 Female 1674 1995. 657. 552. 525. 579.
#> 8 Area1 2009 Male 2049 1848. 657. 729. 698. 761.
#> 9 Area1 2010 Female 2141 1780. 657. 791. 757. 825.
#> 10 Area1 2010 Male 1435 1857. 657. 508. 482. 535.
#> # ℹ 30 more rows
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.