It is often desirable to visualize student success data with the ability to disaggregate by multiple group variables to highlight equity gaps and disproportionate impact (DI) in an interactive dashboard (eg, Tableau or Power BI). It is certainly feasible to calculate disproportionate impact on the fly in standard dashboard tools, but doing so:
A suggested workflow is to:
Using this workflow, one could scale up DI calculations and rapidly develop dashboards with the ability to disaggregate and highlight equity gaps / disproportionate impact for many disaggregation variables, many outcomes, and many scenarios / student populations.
The DisImpact
package offers the di_iterate
function that allows one to accomplish step 2 in the suggested workflow.
DisImpact
and toy data setFirst, load the necessary packages.
library(DisImpact)
library(dplyr) # Ease in manipulations with data frames
Second, load a toy data set.
data(student_equity) # provided from DisImpact
dim(student_equity)
## [1] 20000 11
# head(student_equity)
Ethnicity | Gender | Cohort | Transfer | Cohort_Math | Math | Cohort_English | English | Ed_Goal | College_Status | Student_ID |
---|---|---|---|---|---|---|---|---|---|---|
Native American | Female | 2017 | 0 | 2017 | 1 | 2017 | 0 | Deg/Transfer | First-time College | 100001 |
Native American | Female | 2017 | 0 | 2019 | 1 | NA | NA | Deg/Transfer | First-time College | 100002 |
Native American | Female | 2017 | 0 | 2018 | 1 | 2017 | 0 | Deg/Transfer | First-time College | 100003 |
Native American | Male | 2017 | 1 | 2017 | 1 | 2018 | 1 | Other | First-time College | 100004 |
Native American | Male | 2017 | 0 | 2019 | 1 | 2019 | 0 | Deg/Transfer | Other | 100005 |
Native American | Male | 2017 | 1 | 2017 | 1 | 2018 | 1 | Other | First-time College | 100006 |
To get a description of each variable, type ?student_equity
in the R console.
di_iterate
on a data setLet's illustrate the di_iterate
function with some key arguments:
data
: a data frame of unitary (student) level or summarized data.success_vars
: all outcome variables of interest.group_vars
: all variables to disaggregate by (for calculating equity gaps and disproportionate impact).cohort_vars
(optional): variables defining cohorts, corresponding to those in success_vars
.scenario_repeat_by_vars
(optional): variables to repeat DI calculations for across all combination of these variables. Use only if the user is interested in performing a DI analysis for variables in group_vars
for everyone in data
, and separately for each combination of subpopulations specified using scenario_repeat_by_vars
. Each combination of these variables (eg, full time, first time college students with an ed goal of degree/transfer as one combination) would constitute an iteration / sample for which to calculate disproportionate impact for outcomes listed in success_vars
and for the disaggregation variables listed in group_vars
.To see the details of these and other arguments, type ?di_iterate
in the R console.
df_di_summary <- di_iterate(data=student_equity
, success_vars=c('Math', 'English', 'Transfer')
, group_vars=c('Ethnicity', 'Gender')
, cohort_vars=c('Cohort_Math', 'Cohort_English', 'Cohort')
, scenario_repeat_by_vars=c('Ed_Goal', 'College_Status')
)
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = "College_Status"
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = "College_Status"
## Joining, by = "Ed_Goal"
## Joining, by = "Ed_Goal"
## df_di_summary <- di_iterate(data=student_equity, success_vars=c('Math', 'English', 'Transfer'), group_vars=c('Ethnicity', 'Gender'), cohort_vars=c('Cohort', 'Cohort', 'Cohort'), scenario_repeat_by_vars=c('Ed_Goal', 'College_Status'))
## df_di_summary <- di_iterate(data=student_equity, success_vars=c('Math', 'English', 'Transfer'), group_vars=c('Ethnicity', 'Gender'), scenario_repeat_by_vars=c('Ed_Goal', 'College_Status'))
## df_di_summary_2 <- di_iterate(data=student_equity, success_vars=c('Math', 'English', 'Transfer'), group_vars=c('Ethnicity', 'Gender'), cohort_vars=c('Cohort_Math', 'Cohort_English', 'Cohort'), scenario_repeat_by_vars=c('Ed_Goal', 'College_Status'), ppg_reference_groups=c('White', 'Male'), di_80_index_reference_groups=c('White', 'Male'))
## df_di_summary <- di_iterate(data=student_equity, success_vars=c('Math', 'English', 'Transfer'), group_vars=c('Ethnicity', 'Gender'), cohort_vars=c('Cohort_Math', 'Cohort_English', 'Cohort'), scenario_repeat_by_vars=c('Ed_Goal', 'College_Status'), ppg_reference_groups=c('all but current'), di_80_index_reference_groups=c('White', 'Male'))
dim(df_di_summary)
## [1] 898 21
df_di_summary %>% head %>% as.data.frame # first few rows
## Ed_Goal College_Status success_variable cohort_variable cohort
## 1 Deg/Transfer First-time College Math Cohort_Math 2017
## 2 Deg/Transfer First-time College Math Cohort_Math 2017
## 3 Deg/Transfer First-time College Math Cohort_Math 2017
## 4 Deg/Transfer First-time College Math Cohort_Math 2017
## 5 Deg/Transfer First-time College Math Cohort_Math 2017
## 6 Deg/Transfer First-time College Math Cohort_Math 2017
## disaggregation group n success pct ppg_reference
## 1 Ethnicity Asian 800 698 0.8725000 0.8150224
## 2 Ethnicity Black 260 199 0.7653846 0.8150224
## 3 Ethnicity Hispanic 549 390 0.7103825 0.8150224
## 4 Ethnicity Multi-Ethnicity 136 108 0.7941176 0.8150224
## 5 Ethnicity Native American 28 22 0.7857143 0.8150224
## 6 Ethnicity White 903 764 0.8460687 0.8150224
## ppg_reference_group moe pct_lo pct_hi di_indicator_ppg
## 1 overall 0.03464823 0.8378518 0.9071482 0
## 2 overall 0.06077702 0.7046076 0.8261616 0
## 3 overall 0.04182538 0.6685571 0.7522079 1
## 4 overall 0.08403431 0.7100833 0.8781520 0
## 5 overall 0.18520259 0.6005117 0.9709169 0
## 6 overall 0.03261236 0.8134563 0.8786810 0
## di_prop_index di_indicator_prop_index di_80_index_reference_group
## 1 1.0705227 0 Asian
## 2 0.9390964 0 Asian
## 3 0.8716110 0 Asian
## 4 0.9743507 0 Asian
## 5 0.9640401 0 Asian
## 6 1.0380925 0 Asian
## di_80_index di_indicator_80_index
## 1 1.0000000 0
## 2 0.8772317 0
## 3 0.8141920 0
## 4 0.9101635 0
## 5 0.9005321 0
## 6 0.9697062 0
The variables di_indicator_ppg
, di_indicator_prop_index
, and di_indicator_80_index
are DI flags using the three methods.
Next, note that the scenario '- All'
is included for all variables passed to scenario_repeat_by_vars
by default:
table(df_di_summary$Ed_Goal)
##
## - All Deg/Transfer Other
## 300 300 298
table(df_di_summary$College_Status)
##
## - All First-time College Other
## 300 300 298
Also note di_iterate
returns non-disaggregated results by default ('- None'
scenario):
table(df_di_summary$disaggregation)
##
## - None Ethnicity Gender
## 90 539 269
Let's inspect the rows corresponding to non-disaggregated results.
# No Disaggregation
df_di_summary %>%
filter(Ed_Goal=='- All', College_Status=='- All', disaggregation=='- None') %>%
as.data.frame
Ed_Goal | College_Status | success_variable | cohort_variable | cohort | disaggregation | group | n | success | pct | ppg_reference | ppg_reference_group | moe | pct_lo | pct_hi | di_indicator_ppg | di_prop_index | di_indicator_prop_index | di_80_index_reference_group | di_80_index | di_indicator_80_index |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
- All | - All | Math | Cohort_Math | 2017 | - None | - All | 4691 | 3828 | 0.8160307 | 0.8160307 | overall | 0.0300000 | 0.7860307 | 0.8460307 | 0 | 1 | 0 | - All | 1 | 0 |
- All | - All | Math | Cohort_Math | 2018 | - None | - All | 7416 | 6108 | 0.8236246 | 0.8236246 | overall | 0.0300000 | 0.7936246 | 0.8536246 | 0 | 1 | 0 | - All | 1 | 0 |
- All | - All | Math | Cohort_Math | 2019 | - None | - All | 4622 | 3772 | 0.8160969 | 0.8160969 | overall | 0.0300000 | 0.7860969 | 0.8460969 | 0 | 1 | 0 | - All | 1 | 0 |
- All | - All | Math | Cohort_Math | 2020 | - None | - All | 1855 | 1573 | 0.8479784 | 0.8479784 | overall | 0.0300000 | 0.8179784 | 0.8779784 | 0 | 1 | 0 | - All | 1 | 0 |
- All | - All | English | Cohort_English | 2017 | - None | - All | 5520 | 4183 | 0.7577899 | 0.7577899 | overall | 0.0300000 | 0.7277899 | 0.7877899 | 0 | 1 | 0 | - All | 1 | 0 |
- All | - All | English | Cohort_English | 2018 | - None | - All | 8543 | 6532 | 0.7646026 | 0.7646026 | overall | 0.0300000 | 0.7346026 | 0.7946026 | 0 | 1 | 0 | - All | 1 | 0 |
- All | - All | English | Cohort_English | 2019 | - None | - All | 3866 | 2938 | 0.7599586 | 0.7599586 | overall | 0.0300000 | 0.7299586 | 0.7899586 | 0 | 1 | 0 | - All | 1 | 0 |
- All | - All | English | Cohort_English | 2020 | - None | - All | 913 | 678 | 0.7426068 | 0.7426068 | overall | 0.0324333 | 0.7101735 | 0.7750401 | 0 | 1 | 0 | - All | 1 | 0 |
- All | - All | Transfer | Cohort | 2017 | - None | - All | 10000 | 5140 | 0.5140000 | 0.5140000 | overall | 0.0300000 | 0.4840000 | 0.5440000 | 0 | 1 | 0 | - All | 1 | 0 |
- All | - All | Transfer | Cohort | 2018 | - None | - All | 10000 | 5388 | 0.5388000 | 0.5388000 | overall | 0.0300000 | 0.5088000 | 0.5688000 | 0 | 1 | 0 | - All | 1 | 0 |
In this section, we emulate what a dashboard could visualize.
Imagine a dashboard with the following dropdown menus and option values:
Each combination of this set of dropdown menus could be visualized using a subset of rows in df_di_summary
.
For example, let's visualize non-disaggregated results for math (the dropdown selections are described at the top of the visualization):
# No Disaggregation
df_di_summary %>%
filter(Ed_Goal=='- All', College_Status=='- All', success_variable=='Math', disaggregation=='- None') %>%
as.data.frame
## Ed_Goal College_Status success_variable cohort_variable cohort
## 1 - All - All Math Cohort_Math 2017
## 2 - All - All Math Cohort_Math 2018
## 3 - All - All Math Cohort_Math 2019
## 4 - All - All Math Cohort_Math 2020
## disaggregation group n success pct ppg_reference
## 1 - None - All 4691 3828 0.8160307 0.8160307
## 2 - None - All 7416 6108 0.8236246 0.8236246
## 3 - None - All 4622 3772 0.8160969 0.8160969
## 4 - None - All 1855 1573 0.8479784 0.8479784
## ppg_reference_group moe pct_lo pct_hi di_indicator_ppg
## 1 overall 0.03 0.7860307 0.8460307 0
## 2 overall 0.03 0.7936246 0.8536246 0
## 3 overall 0.03 0.7860969 0.8460969 0
## 4 overall 0.03 0.8179784 0.8779784 0
## di_prop_index di_indicator_prop_index di_80_index_reference_group
## 1 1 0 - All
## 2 1 0 - All
## 3 1 0 - All
## 4 1 0 - All
## di_80_index di_indicator_80_index
## 1 1 0
## 2 1 0
## 3 1 0
## 4 1 0
{width=100%}
In this dashboard, one could choose to disaggregate by ethnicity and highlight disproportionate impact (for simplicity, let's use the percentage point gap method, or the di_indicator_ppg
flag in subsequent visualizations):
# Disaggregation: Ethnicity
df_di_summary %>%
filter(Ed_Goal=='- All', College_Status=='- All', success_variable=='Math', disaggregation=='Ethnicity') %>%
select(cohort, group, n, pct, di_indicator_ppg, di_indicator_prop_index, di_indicator_80_index) %>%
as.data.frame
## cohort group n pct di_indicator_ppg
## 1 2017 Asian 1456 0.8804945 0
## 2 2017 Black 452 0.7190265 1
## 3 2017 Hispanic 901 0.6947836 1
## 4 2017 Multi-Ethnicity 245 0.8204082 0
## 5 2017 Native American 46 0.8260870 0
## 6 2017 White 1591 0.8522942 0
## 7 2018 Asian 2251 0.9009329 0
## 8 2018 Black 736 0.7160326 1
## 9 2018 Hispanic 1404 0.6972934 1
## 10 2018 Multi-Ethnicity 379 0.8179420 0
## 11 2018 Native American 77 0.7792208 0
## 12 2018 White 2569 0.8579214 0
## 13 2019 Asian 1450 0.8862069 0
## 14 2019 Black 435 0.7195402 1
## 15 2019 Hispanic 866 0.6755196 1
## 16 2019 Multi-Ethnicity 227 0.8193833 0
## 17 2019 Native American 41 0.8048780 0
## 18 2019 White 1603 0.8546475 0
## 19 2020 Asian 582 0.9278351 0
## 20 2020 Black 171 0.7543860 1
## 21 2020 Hispanic 345 0.6956522 1
## 22 2020 Multi-Ethnicity 81 0.8395062 0
## 23 2020 Native American 17 0.8235294 0
## 24 2020 White 659 0.8831563 0
## di_indicator_prop_index di_indicator_80_index
## 1 0 0
## 2 0 0
## 3 0 1
## 4 0 0
## 5 0 0
## 6 0 0
## 7 0 0
## 8 0 1
## 9 0 1
## 10 0 0
## 11 0 0
## 12 0 0
## 13 0 0
## 14 0 0
## 15 0 1
## 16 0 0
## 17 0 0
## 18 0 0
## 19 0 0
## 20 0 0
## 21 0 1
## 22 0 0
## 23 0 0
## 24 0 0
{width=100%}
In a dashboard, the user might be interested in focusing on degree/transfer students. We emulate this by filtering on Ed_Goal=='Deg/Transer'
:
# Disaggregation: Ethnicity; Deg/Transfer
df_di_summary %>%
filter(Ed_Goal=='Deg/Transfer', College_Status=='- All', success_variable=='Math', disaggregation=='Ethnicity') %>%
select(cohort, group, n, pct, di_indicator_ppg, di_indicator_prop_index, di_indicator_80_index) %>%
as.data.frame
## cohort group n pct di_indicator_ppg
## 1 2017 Asian 1000 0.8820000 0
## 2 2017 Black 320 0.7500000 1
## 3 2017 Hispanic 665 0.7037594 1
## 4 2017 Multi-Ethnicity 164 0.8109756 0
## 5 2017 Native American 32 0.8125000 0
## 6 2017 White 1108 0.8546931 0
## 7 2018 Asian 1564 0.8951407 0
## 8 2018 Black 515 0.6990291 1
## 9 2018 Hispanic 989 0.6966633 1
## 10 2018 Multi-Ethnicity 262 0.8511450 0
## 11 2018 Native American 62 0.7741935 0
## 12 2018 White 1763 0.8536585 0
## 13 2019 Asian 1019 0.8802748 0
## 14 2019 Black 310 0.6838710 1
## 15 2019 Hispanic 602 0.6843854 1
## 16 2019 Multi-Ethnicity 166 0.8012048 0
## 17 2019 Native American 26 0.7692308 0
## 18 2019 White 1160 0.8465517 0
## 19 2020 Asian 408 0.9117647 0
## 20 2020 Black 122 0.7459016 1
## 21 2020 Hispanic 244 0.7172131 1
## 22 2020 Multi-Ethnicity 57 0.8245614 0
## 23 2020 Native American 9 0.6666667 0
## 24 2020 White 458 0.8711790 0
## di_indicator_prop_index di_indicator_80_index
## 1 0 0
## 2 0 0
## 3 0 1
## 4 0 0
## 5 0 0
## 6 0 0
## 7 0 0
## 8 0 1
## 9 0 1
## 10 0 0
## 11 0 0
## 12 0 0
## 13 0 0
## 14 0 1
## 15 0 1
## 16 0 0
## 17 0 0
## 18 0 0
## 19 0 0
## 20 0 0
## 21 0 1
## 22 0 0
## 23 1 1
## 24 0 0
{width=100%}
In a dashboard, the user could switch the outcome to English and disaggregate by Gender:
# Disaggregation: Gender; Deg/Transfer; English
df_di_summary %>%
filter(Ed_Goal=='Deg/Transfer', College_Status=='- All', success_variable=='English', disaggregation=='Gender') %>%
as.data.frame
## Ed_Goal College_Status success_variable cohort_variable cohort
## 1 Deg/Transfer - All English Cohort_English 2017
## 2 Deg/Transfer - All English Cohort_English 2017
## 3 Deg/Transfer - All English Cohort_English 2017
## 4 Deg/Transfer - All English Cohort_English 2018
## 5 Deg/Transfer - All English Cohort_English 2018
## 6 Deg/Transfer - All English Cohort_English 2018
## 7 Deg/Transfer - All English Cohort_English 2019
## 8 Deg/Transfer - All English Cohort_English 2019
## 9 Deg/Transfer - All English Cohort_English 2019
## 10 Deg/Transfer - All English Cohort_English 2020
## 11 Deg/Transfer - All English Cohort_English 2020
## 12 Deg/Transfer - All English Cohort_English 2020
## disaggregation group n success pct ppg_reference
## 1 Gender Female 1916 1424 0.7432150 0.7496751
## 2 Gender Male 1863 1411 0.7573806 0.7496751
## 3 Gender Other 68 49 0.7205882 0.7496751
## 4 Gender Female 2833 2151 0.7592658 0.7597185
## 5 Gender Male 3003 2296 0.7645688 0.7597185
## 6 Gender Other 132 87 0.6590909 0.7597185
## 7 Gender Female 1385 1032 0.7451264 0.7577753
## 8 Gender Male 1308 1003 0.7668196 0.7577753
## 9 Gender Other 40 36 0.9000000 0.7577753
## 10 Gender Female 307 213 0.6938111 0.7192429
## 11 Gender Male 315 234 0.7428571 0.7192429
## 12 Gender Other 12 9 0.7500000 0.7192429
## ppg_reference_group moe pct_lo pct_hi di_indicator_ppg
## 1 overall 0.03000000 0.7132150 0.7732150 0
## 2 overall 0.03000000 0.7273806 0.7873806 0
## 3 overall 0.11884246 0.6017458 0.8394307 0
## 4 overall 0.03000000 0.7292658 0.7892658 0
## 5 overall 0.03000000 0.7345688 0.7945688 0
## 6 overall 0.08529805 0.5737929 0.7443890 1
## 7 overall 0.03000000 0.7151264 0.7751264 0
## 8 overall 0.03000000 0.7368196 0.7968196 0
## 9 overall 0.15495161 0.7450484 1.0549516 0
## 10 overall 0.05593155 0.6378795 0.7497426 0
## 11 overall 0.05521674 0.6876404 0.7980739 0
## 12 overall 0.28290163 0.4670984 1.0329016 0
## di_prop_index di_indicator_prop_index di_80_index_reference_group
## 1 0.9913829 0 Male
## 2 1.0102784 0 Male
## 3 0.9612007 0 Male
## 4 0.9994041 0 Male
## 5 1.0063843 0 Male
## 6 0.8675462 0 Male
## 7 0.9833077 0 Other
## 8 1.0119352 0 Other
## 9 1.1876871 0 Other
## 10 0.9646408 0 Other
## 11 1.0328321 0 Other
## 12 1.0427632 0 Other
## di_80_index di_indicator_80_index
## 1 0.9812967 0
## 2 1.0000000 0
## 3 0.9514216 0
## 4 0.9930641 0
## 5 1.0000000 0
## 6 0.8620427 0
## 7 0.8279182 0
## 8 0.8520217 0
## 9 1.0000000 0
## 10 0.9250814 0
## 11 0.9904762 0
## 12 1.0000000 0
{width=100%}
group_vars
and scenario_repeat_by_vars
?For different classification variables, (eg, age group, full time status, and education goal), the user might be confused as to whether to pass these into the group_vars
argument or the scenario_repeat_by_vars
argument. The answer is it depends on what the user wants to analyze. If we think of a single student population of interest (eg, the data set being passed to di_iterate
such as all students enrolled at the institution), then the user should pass into group_vars
all variables that they are interested in disaggregating on and performing a DI analysis (eg, are there disparity among ethnic student groups? First generation students?). The group_vars
argument is required.
On the other hand, the scenario_repeat_by_vars
argument is optional, and when not specified, the DI analysis is performed on all outcomes specified in success_vars
and all disaggregation variables specified in group_vars
, using all students passed to data
as a single population. The user should only pass variables into scenario_repeat_by_vars
if they want to split the student population into multiple subpopulations to perform DI analysis on. For example, if ethnicity, first generation status, and age group and were specified in group_vars
, then the user is trying to answer the following questions:
If on the other hand, the user passes ethnicity and first generation status to group_vars
, and age group to scenario_repeat_by_vars
, then the user is trying to answer the following questions:
data
?
b. Among different subpopulations defined by age group? (eg, among each of these groups: 18-21, 22-25, 26-35, 35-50, 51+)data
?
b. Among different subpopulations defined by age group? (eg, among each of these groups: 18-21, 22-25, 26-35, 35-50, 51+)di_iterate
, and overriding themThe function di_iterate
has been designed to be highly flexible through the use of function arguments / parameters, with many defaults:
args(di_iterate)
## function (data, success_vars, group_vars, cohort_vars = NULL,
## scenario_repeat_by_vars = NULL, exclude_scenario_df = NULL,
## weight_var = NULL, include_non_disagg_results = TRUE, ppg_reference_groups = "overall",
## min_moe = 0.03, use_prop_in_moe = FALSE, prop_sub_0 = 0.5,
## prop_sub_1 = 0.5, di_prop_index_cutoff = 0.8, di_80_index_cutoff = 0.8,
## di_80_index_reference_groups = NA, check_valid_reference = TRUE)
## NULL
In this section, we illustrate how each argument could be used. Type ?di_iterate
to read the description of each.
data
and using weight_var
Instead of passing in a student level data set, the user could also pass in a summarized data set (reduce data size). To do so, the user should also specify weight_var
to indicate the group size. Let's illustrate with an example:
dim(student_equity)
## [1] 20000 11
## Example summarized data set
student_equity_summ <- student_equity %>%
group_by(Ethnicity, Gender, Cohort, Cohort_Math, Cohort_English, Ed_Goal, College_Status) %>%
summarize(N=n() %>% as.numeric # not needed, for all.equal()
, Math=sum(Math, na.rm=TRUE)
, English=sum(English, na.rm=TRUE)
, Transfer=sum(Transfer, na.rm=TRUE)
) %>%
ungroup
dim(student_equity_summ) # same number of columns, less number of rows
## [1] 1392 11
student_equity_summ %>% head %>% as.data.frame # first few rows
## Ethnicity Gender Cohort Cohort_Math Cohort_English Ed_Goal
## 1 Asian Female 2017 2017 2017 Deg/Transfer
## 2 Asian Female 2017 2017 2017 Deg/Transfer
## 3 Asian Female 2017 2017 2017 Other
## 4 Asian Female 2017 2017 2017 Other
## 5 Asian Female 2017 2017 2018 Deg/Transfer
## 6 Asian Female 2017 2017 2018 Deg/Transfer
## College_Status N Math English Transfer
## 1 First-time College 218 196 185 162
## 2 Other 60 55 50 40
## 3 First-time College 99 87 86 74
## 4 Other 22 20 18 16
## 5 First-time College 136 118 113 94
## 6 Other 37 34 35 28
## Run on summarized data set
df_di_summary_2 <- di_iterate(data=student_equity_summ
, success_vars=c('Math', 'English', 'Transfer')
, group_vars=c('Ethnicity', 'Gender')
, cohort_vars=c('Cohort_Math', 'Cohort_English', 'Cohort')
, scenario_repeat_by_vars=c('Ed_Goal', 'College_Status')
, weight_var='N' # SET THIS
)
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = "College_Status"
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = "College_Status"
## Joining, by = "Ed_Goal"
## Joining, by = "Ed_Goal"
dim(df_di_summary)
## [1] 898 21
dim(df_di_summary_2) # more rows? because of NA cohort
## [1] 1075 21
dim(df_di_summary_2 %>% filter(!is.na(cohort)))
## [1] 898 21
## ## if user wants to see the extra rows
## extra_rows <- df_di_summary_2 %>%
## anti_join(df_di_summary %>% select(Ed_Goal, College_Status, success_variable, cohort_variable, cohort, disaggregation, group))
## difference %>% head %>% as.data.frame
all.equal(df_di_summary
, df_di_summary_2 %>% filter(!is.na(cohort))
) # returned results are the same
## [1] TRUE
include_non_disagg_results
By default, the non-disaggregated results are also returned. If the user wants to suppress this, they could set include_non_disagg_results=FALSE
:
df_di_summary_2 <- di_iterate(data=student_equity
, success_vars=c('Math', 'English', 'Transfer')
, group_vars=c('Ethnicity', 'Gender')
, cohort_vars=c('Cohort_Math', 'Cohort_English', 'Cohort')
, scenario_repeat_by_vars=c('Ed_Goal', 'College_Status')
, include_non_disagg_results=FALSE ## SET THIS
)
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = "College_Status"
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = "College_Status"
## Joining, by = "Ed_Goal"
## Joining, by = "Ed_Goal"
dim(df_di_summary)
## [1] 898 21
dim(df_di_summary_2) ## less rows because no longer have disaggregated results
## [1] 808 21
table(df_di_summary_2$disaggregation) # No more '- None'
##
## Ethnicity Gender
## 539 269
For the percentage point gap (PPG) method, di_iterate
defaults to using the overall success rate as the reference for comparison (ppg_reference_groups='overall'
). The user could set ppg_reference_groups='hpg'
for using the highest performing group as the comparison group, or ppg_reference_groups='all but current'
for using the combined success rate of all other groups excluding the group of interest (eg, if studying Hispanic students, then the reference group would be all non-Hispanic students). The latter is sometimes referred to as “PPG minus 1.” The user could also specify specific groups as reference:
df_di_summary_2 <- di_iterate(data=student_equity
, success_vars=c('Math', 'English', 'Transfer')
, group_vars=c('Ethnicity', 'Gender')
, cohort_vars=c('Cohort_Math', 'Cohort_English', 'Cohort')
, scenario_repeat_by_vars=c('Ed_Goal', 'College_Status')
, ppg_reference_groups=c('White', 'Male') ## corresponds to each variable in group_vars
)
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = "College_Status"
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = "College_Status"
## Joining, by = "Ed_Goal"
## Joining, by = "Ed_Goal"
The following arguments apply to the PPG: min_moe
, use_prop_in_moe
, prop_sub_0
, prop_sub_1
, and use_prop_in_moe
. See ?di_ppg
for more details.
For the proportionality index (PI) method, DI is determined using di_prop_index_cutoff=0.8
by default. This could be changed:
df_di_summary_2 <- di_iterate(data=student_equity
, success_vars=c('Math', 'English', 'Transfer')
, group_vars=c('Ethnicity', 'Gender')
, cohort_vars=c('Cohort_Math', 'Cohort_English', 'Cohort')
, scenario_repeat_by_vars=c('Ed_Goal', 'College_Status')
, di_prop_index_cutoff=0.9 # Easier to declare DI using PI
)
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = "College_Status"
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = "College_Status"
## Joining, by = "Ed_Goal"
## Joining, by = "Ed_Goal"
For the 80% index method, the highest performing group is used as reference by default (di_80_index_reference_groups=NA
). Similar to the PPG, the user could specify custom reference groups:
df_di_summary_2 <- di_iterate(data=student_equity
, success_vars=c('Math', 'English', 'Transfer')
, group_vars=c('Ethnicity', 'Gender')
, cohort_vars=c('Cohort_Math', 'Cohort_English', 'Cohort')
, scenario_repeat_by_vars=c('Ed_Goal', 'College_Status')
, di_80_index_reference_groups=c('White', 'Male') ## corresponds to each variable in group_vars
)
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = "College_Status"
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = "College_Status"
## Joining, by = "Ed_Goal"
## Joining, by = "Ed_Goal"
The 80% index uses 80% as the default threshold for declaring DI. The user could alter this with the di_80_index_cutoff
argument:
df_di_summary_2 <- di_iterate(data=student_equity
, success_vars=c('Math', 'English', 'Transfer')
, group_vars=c('Ethnicity', 'Gender')
, cohort_vars=c('Cohort_Math', 'Cohort_English', 'Cohort')
, scenario_repeat_by_vars=c('Ed_Goal', 'College_Status')
, di_80_index_cutoff=0.5 # Harder to declare DI using 80% index
)
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = "College_Status"
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = "College_Status"
## Joining, by = "Ed_Goal"
## Joining, by = "Ed_Goal"
In a single call of di_iterate
, the results of all three DI methods are returned in one run. If the user is interested in doing DI calculations using various scenarios of the same method (eg, using the overall rate as reference for PPG, and using a pre-specified list of reference rates), then it is recommended that the user execute di_iterate
multiple times and combining the results (stacking). If the user chooses to do this, then it is a good idea to set include_non_disagg_results=FALSE
in subsequent di_iterate
runs to not duplicate rows of non-disaggregated results.
# Multiple group variables and different reference groups
df_di_summary_long <- bind_rows(
di_iterate(data=student_equity
, success_vars=c('Math', 'English', 'Transfer')
, group_vars=c('Ethnicity', 'Gender')
, cohort_vars=c('Cohort_Math', 'Cohort_English', 'Cohort')
, scenario_repeat_by_vars=c('Ed_Goal', 'College_Status')
)
, di_iterate(data=student_equity
, success_vars=c('Math', 'English', 'Transfer')
, group_vars=c('Ethnicity', 'Gender')
, cohort_vars=c('Cohort_Math', 'Cohort_English', 'Cohort')
, scenario_repeat_by_vars=c('Ed_Goal', 'College_Status')
, ppg_reference_groups=c('White', 'Male') ## corresponds to each variable in group_vars
, include_non_disagg_results = FALSE # Already have non-disaggregated results in the first run
)
)
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = "College_Status"
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = "College_Status"
## Joining, by = "Ed_Goal"
## Joining, by = "Ed_Goal"
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = "College_Status"
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = c("Ed_Goal", "College_Status")
## Joining, by = "College_Status"
## Joining, by = "Ed_Goal"
## Joining, by = "Ed_Goal"
dim(df_di_summary_long)
## [1] 1706 21
Since di_iterate
disaggregates on many variables and subpopulations, it is not uncommon the returned results contain rows summarizing small samples. As is common in education research, care should be taken to not unintentionally disclose the educational outcomes of students (results linked to particular students, ie, FERPA regulation). The user might want to filter out rows with small samples (eg, n < 10
):
## df_di_summary %>%
## mutate(FERPA_Block=ifelse(n < 10, 1, 0)) %>%
## filter(FERPA_Block == 0)