The srvyr package aims to add dplyr like syntax to Professor Thomas Lumley’s survey package. In this vignette we will recreate several examples from Lumley’s textbook Complex Surveys: A Guide to Analysis Using R using srvyr.

as_survey_design

The meat of the sryvr package is the as_survey_design function. This is similar to the svydesign function from the survey package, but with some dplyr like functionality. as_survey_design is different from svydesign because it supports magrittr pipes, it uses bare column names rather than using a “~” before the name of the variable like in the survey package, and you can use other dplyr verbs like summarise, mutate, and group_by. Here are some basic examples using as_survey_design and the functions survey_total and survey_mean which are the srvyr equivalents of svytotal and svymean respectively:

library(srvyr)
library(survey)
library(pander)
data(api)

# simple random sample
srs_design <- apisrs %>% as_survey_design(ids = 1, fpc = fpc)
srs_design
## Independent Sampling design
## [1] "called via srvyr"
## Sampling variables:
##  - fpc: fpc
## Data variables: cds (chr), stype (fctr), name (chr), sname (chr), snum
##   (dbl), dname (chr), dnum (int), cname (chr), cnum (int), flag (int),
##   pcttest (int), api00 (int), api99 (int), target (int), growth (int),
##   sch.wide (fctr), comp.imp (fctr), both (fctr), awards (fctr), meals
##   (int), ell (int), yr.rnd (fctr), mobility (int), acs.k3 (int), acs.46
##   (int), acs.core (int), pct.resp (int), not.hsg (int), hsg (int),
##   some.col (int), col.grad (int), grad.sch (int), avg.ed (dbl), full
##   (int), emer (int), enroll (int), api.stu (int), pw (dbl), fpc (dbl)
srs_design %>% 
  summarise(Total = survey_total(enroll)) %>% 
  pander()
Total Total_se
3621074 169519.7
srs_design %>% 
  summarise(Mean = survey_mean(enroll)) %>%
  pander()
Mean Mean_se
584.61 27.36837
# weighted sample
nofpc <- apisrs %>% as_survey_design(weights = pw)
nofpc
## Independent Sampling design (with replacement)
## [1] "called via srvyr"
## Sampling variables:
##  - weights: pw
## Data variables: cds (chr), stype (fctr), name (chr), sname (chr), snum
##   (dbl), dname (chr), dnum (int), cname (chr), cnum (int), flag (int),
##   pcttest (int), api00 (int), api99 (int), target (int), growth (int),
##   sch.wide (fctr), comp.imp (fctr), both (fctr), awards (fctr), meals
##   (int), ell (int), yr.rnd (fctr), mobility (int), acs.k3 (int), acs.46
##   (int), acs.core (int), pct.resp (int), not.hsg (int), hsg (int),
##   some.col (int), col.grad (int), grad.sch (int), avg.ed (dbl), full
##   (int), emer (int), enroll (int), api.stu (int), pw (dbl), fpc (dbl)
nofpc %>% 
  summarise(Total = survey_total(enroll)) %>% 
  pander()
Total Total_se
3621074 172324.6
nofpc %>% 
  summarise(Mean = survey_mean(enroll)) %>% 
  pander()
Mean Mean_se
584.61 27.82121
# stratified sample
strat_design <- apistrat %>% as_survey_design(strata = stype, fpc = fpc)
strat_design
## Stratified Independent Sampling design
## [1] "called via srvyr"
## Sampling variables:
##  - strata: stype
##  - fpc: fpc
## Data variables: cds (chr), stype (fctr), name (chr), sname (chr), snum
##   (dbl), dname (chr), dnum (int), cname (chr), cnum (int), flag (int),
##   pcttest (int), api00 (int), api99 (int), target (int), growth (int),
##   sch.wide (fctr), comp.imp (fctr), both (fctr), awards (fctr), meals
##   (int), ell (int), yr.rnd (fctr), mobility (int), acs.k3 (int), acs.46
##   (int), acs.core (int), pct.resp (int), not.hsg (int), hsg (int),
##   some.col (int), col.grad (int), grad.sch (int), avg.ed (dbl), full
##   (int), emer (int), enroll (int), api.stu (int), pw (dbl), fpc (dbl)
strat_design %>% 
  summarise(Total = survey_total(enroll)) %>% 
  pander()
Total Total_se
3687178 114641.7
strat_design %>% 
  summarise(Mean=survey_mean(enroll)) %>% 
  pander()
Mean Mean_se
595.2821 18.50851
# try with mutate
srs_design %>% 
  mutate(apidiff = api00 - api99) %>% 
  summarise(Mean = survey_mean(apidiff)) %>%
  pander()
Mean Mean_se
31.9 2.090493
srs_design %>% 
  mutate(apidiffpercent = (api00 - api99) / api99) %>% 
  summarise(Mean = survey_mean(apidiffpercent)) %>% 
  pander()
Mean Mean_se
0.05608716 0.004068624
# try with group_by
strat_design %>% 
  group_by(stype) %>% 
  summarise(Totals = survey_total(enroll)) %>% 
  pander()
stype Totals Totals_se
E 1842584.4 72581.34
H 997128.5 69239.39
M 847464.6 55502.96