The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

DisImpact Tutorial

Vinh Nguyen

2022-10-10

Introduction

The DisImpact R package contains functions that help in determining disproportionate impact (DI) based on the following methodologies:

  1. percentage point gap (PPG) method,
  2. proportionality index method (method #1 in reference), and
  3. 80% index method (method #2 in reference).

Install Package

# From CRAN (Official)
install.packages('DisImpact')

# From github (Development)
devtools::install_github('vinhdizzo/DisImpact')

Load Packages

library(DisImpact)
library(dplyr) # Ease in manipulations with data frames

Load toy student equity data

To illustrate the functionality of the package, let’s load a toy data set:

# Load fake data set
data(student_equity)

# Print first few observations
head(student_equity)
##         Ethnicity Gender Cohort Transfer Cohort_Math Math Cohort_English
## 1 Native American Female   2017        0        2017    1           2017
## 2 Native American Female   2017        0        2018    1             NA
## 3 Native American Female   2017        0        2018    1           2017
## 4 Native American   Male   2017        1        2017    1           2018
## 5 Native American   Male   2017        0        2017    1           2019
## 6 Native American   Male   2017        1        2019    1           2018
##   English      Ed_Goal     College_Status Student_ID EthnicityFlag_Asian
## 1       0 Deg/Transfer First-time College     100001                   0
## 2      NA Deg/Transfer First-time College     100002                   0
## 3       0 Deg/Transfer First-time College     100003                   0
## 4       1        Other First-time College     100004                   0
## 5       0 Deg/Transfer              Other     100005                   0
## 6       1        Other First-time College     100006                   0
##   EthnicityFlag_Black EthnicityFlag_Hispanic EthnicityFlag_NativeAmerican
## 1                   0                      0                            1
## 2                   0                      0                            1
## 3                   0                      0                            1
## 4                   0                      0                            1
## 5                   0                      0                            1
## 6                   0                      0                            1
##   EthnicityFlag_PacificIslander EthnicityFlag_White EthnicityFlag_Carribean
## 1                             0                   0                       0
## 2                             0                   0                       0
## 3                             0                   0                       0
## 4                             0                   0                       0
## 5                             0                   0                       0
## 6                             0                   0                       0
##   EthnicityFlag_EastAsian EthnicityFlag_SouthEastAsian
## 1                       0                            0
## 2                       0                            0
## 3                       0                            0
## 4                       0                            0
## 5                       0                            0
## 6                       0                            0
##   EthnicityFlag_SouthWestAsianNorthAfrican EthnicityFlag_AANAPI
## 1                                        0                    1
## 2                                        0                    1
## 3                                        0                    1
## 4                                        0                    1
## 5                                        0                    1
## 6                                        0                    1
##   EthnicityFlag_Unknown EthnicityFlag_TwoorMoreRaces
## 1                     0                            0
## 2                     0                            0
## 3                     0                            0
## 4                     0                            0
## 5                     0                            0
## 6                     0                            0
# For description of data set
## ?student_equity

For a description of the student_equity data set, type ?student_equity in the R console.

The toy data set can be summarized as follows:

# Summarize toy data
dim(student_equity)
## [1] 20000    24
dSumm <- student_equity %>%
  group_by(Cohort, Ethnicity) %>%
  summarize(n=n(), Transfer_Rate=mean(Transfer))
## `summarise()` has grouped output by 'Cohort'. You can override using the
## `.groups` argument.
dSumm ## This is a summarized version of the data set
## # A tibble: 12 x 4
## # Groups:   Cohort [2]
##    Cohort Ethnicity           n Transfer_Rate
##     <int> <chr>           <int>         <dbl>
##  1   2017 Asian            3000         0.687
##  2   2017 Black            1000         0.31 
##  3   2017 Hispanic         2000         0.205
##  4   2017 Multi-Ethnicity   500         0.524
##  5   2017 Native American   100         0.43 
##  6   2017 White            3400         0.604
##  7   2018 Asian            3000         0.743
##  8   2018 Black            1000         0.297
##  9   2018 Hispanic         2000         0.218
## 10   2018 Multi-Ethnicity   500         0.484
## 11   2018 Native American   100         0.35 
## 12   2018 White            3400         0.631

Percentage point gap (PPG) method

di_ppg is the main work function, and it can take on vectors or column names the tidy way:

# Vector
di_ppg(success=student_equity$Transfer, group=student_equity$Ethnicity) %>% as.data.frame
##             group    n success       pct reference reference_group        moe
## 1           Asian 6000    4292 0.7153333    0.5264         overall 0.03000000
## 2           Black 2000     607 0.3035000    0.5264         overall 0.03000000
## 3        Hispanic 4000     847 0.2117500    0.5264         overall 0.03000000
## 4 Multi-Ethnicity 1000     504 0.5040000    0.5264         overall 0.03099032
## 5 Native American  200      78 0.3900000    0.5264         overall 0.06929646
## 6           White 6800    4200 0.6176471    0.5264         overall 0.03000000
##      pct_lo    pct_hi di_indicator success_needed_not_di
## 1 0.6853333 0.7453333            0                     0
## 2 0.2735000 0.3335000            1                   386
## 3 0.1817500 0.2417500            1                  1139
## 4 0.4730097 0.5349903            0                     0
## 5 0.3207035 0.4592965            1                    14
## 6 0.5876471 0.6476471            0                     0
##   success_needed_full_parity
## 1                          0
## 2                        446
## 3                       1259
## 4                         23
## 5                         28
## 6                          0
# Tidy and column reference
di_ppg(success=Transfer, group=Ethnicity, data=student_equity) %>%
  as.data.frame
##             group    n success       pct reference reference_group        moe
## 1           Asian 6000    4292 0.7153333    0.5264         overall 0.03000000
## 2           Black 2000     607 0.3035000    0.5264         overall 0.03000000
## 3        Hispanic 4000     847 0.2117500    0.5264         overall 0.03000000
## 4 Multi-Ethnicity 1000     504 0.5040000    0.5264         overall 0.03099032
## 5 Native American  200      78 0.3900000    0.5264         overall 0.06929646
## 6           White 6800    4200 0.6176471    0.5264         overall 0.03000000
##      pct_lo    pct_hi di_indicator success_needed_not_di
## 1 0.6853333 0.7453333            0                     0
## 2 0.2735000 0.3335000            1                   386
## 3 0.1817500 0.2417500            1                  1139
## 4 0.4730097 0.5349903            0                     0
## 5 0.3207035 0.4592965            1                    14
## 6 0.5876471 0.6476471            0                     0
##   success_needed_full_parity
## 1                          0
## 2                        446
## 3                       1259
## 4                         23
## 5                         28
## 6                          0

For a description of the di_ppg function, including both function arguments and returned results, type ?di_ppg in the R console.

Sometimes, one might want to break out the DI calculation by cohort:

# Cohort
di_ppg(success=Transfer, group=Ethnicity, cohort=Cohort, data=student_equity) %>%
  as.data.frame
##    cohort           group    n success       pct reference reference_group
## 1    2017           Asian 3000    2062 0.6873333    0.5140         overall
## 2    2017           Black 1000     310 0.3100000    0.5140         overall
## 3    2017        Hispanic 2000     410 0.2050000    0.5140         overall
## 4    2017 Multi-Ethnicity  500     262 0.5240000    0.5140         overall
## 5    2017 Native American  100      43 0.4300000    0.5140         overall
## 6    2017           White 3400    2053 0.6038235    0.5140         overall
## 7    2018           Asian 3000    2230 0.7433333    0.5388         overall
## 8    2018           Black 1000     297 0.2970000    0.5388         overall
## 9    2018        Hispanic 2000     437 0.2185000    0.5388         overall
## 10   2018 Multi-Ethnicity  500     242 0.4840000    0.5388         overall
## 11   2018 Native American  100      35 0.3500000    0.5388         overall
## 12   2018           White 3400    2147 0.6314706    0.5388         overall
##           moe    pct_lo    pct_hi di_indicator success_needed_not_di
## 1  0.03000000 0.6573333 0.7173333            0                     0
## 2  0.03099032 0.2790097 0.3409903            1                   174
## 3  0.03000000 0.1750000 0.2350000            1                   558
## 4  0.04382693 0.4801731 0.5678269            0                     0
## 5  0.09800000 0.3320000 0.5280000            0                     0
## 6  0.03000000 0.5738235 0.6338235            0                     0
## 7  0.03000000 0.7133333 0.7733333            0                     0
## 8  0.03099032 0.2660097 0.3279903            1                   211
## 9  0.03000000 0.1885000 0.2485000            1                   581
## 10 0.04382693 0.4401731 0.5278269            1                     6
## 11 0.09800000 0.2520000 0.4480000            1                    10
## 12 0.03000000 0.6014706 0.6614706            0                     0
##    success_needed_full_parity
## 1                           0
## 2                         205
## 3                         619
## 4                           0
## 5                           9
## 6                           0
## 7                           0
## 8                         242
## 9                         641
## 10                         28
## 11                         19
## 12                          0

di_ppg is also applicable to summarized data; just pass the counts to success and group size to weight. For example, we use the summarized data set, dSumm, and sample size n, in the following:

di_ppg(success=Transfer_Rate*n, group=Ethnicity, cohort=Cohort, weight=n, data=dSumm) %>%
  as.data.frame
##    cohort           group    n success       pct reference reference_group
## 1    2017           Asian 3000    2062 0.6873333    0.5140         overall
## 2    2017           Black 1000     310 0.3100000    0.5140         overall
## 3    2017        Hispanic 2000     410 0.2050000    0.5140         overall
## 4    2017 Multi-Ethnicity  500     262 0.5240000    0.5140         overall
## 5    2017 Native American  100      43 0.4300000    0.5140         overall
## 6    2017           White 3400    2053 0.6038235    0.5140         overall
## 7    2018           Asian 3000    2230 0.7433333    0.5388         overall
## 8    2018           Black 1000     297 0.2970000    0.5388         overall
## 9    2018        Hispanic 2000     437 0.2185000    0.5388         overall
## 10   2018 Multi-Ethnicity  500     242 0.4840000    0.5388         overall
## 11   2018 Native American  100      35 0.3500000    0.5388         overall
## 12   2018           White 3400    2147 0.6314706    0.5388         overall
##           moe    pct_lo    pct_hi di_indicator success_needed_not_di
## 1  0.03000000 0.6573333 0.7173333            0                     0
## 2  0.03099032 0.2790097 0.3409903            1                   174
## 3  0.03000000 0.1750000 0.2350000            1                   558
## 4  0.04382693 0.4801731 0.5678269            0                     0
## 5  0.09800000 0.3320000 0.5280000            0                     0
## 6  0.03000000 0.5738235 0.6338235            0                     0
## 7  0.03000000 0.7133333 0.7733333            0                     0
## 8  0.03099032 0.2660097 0.3279903            1                   211
## 9  0.03000000 0.1885000 0.2485000            1                   581
## 10 0.04382693 0.4401731 0.5278269            1                     6
## 11 0.09800000 0.2520000 0.4480000            1                    10
## 12 0.03000000 0.6014706 0.6614706            0                     0
##    success_needed_full_parity
## 1                           0
## 2                         205
## 3                         619
## 4                           0
## 5                           9
## 6                           0
## 7                           0
## 8                         242
## 9                         641
## 10                         28
## 11                         19
## 12                          0

By default, di_ppg uses the overall success rate as the reference rate for comparison (default: reference='overall'). The reference argument also accepts 'hpg' (highest performing group success rate as the reference rate), 'all but current' (success rate of all groups combined excluding the comparison group), or a group value from group.

# Reference: Highest performing group
di_ppg(success=Transfer, group=Ethnicity, cohort=Cohort, reference='hpg', data=student_equity) %>%
  as.data.frame
##    cohort           group    n success       pct reference reference_group
## 1    2017           Asian 3000    2062 0.6873333 0.6873333           Asian
## 2    2017           Black 1000     310 0.3100000 0.6873333           Asian
## 3    2017        Hispanic 2000     410 0.2050000 0.6873333           Asian
## 4    2017 Multi-Ethnicity  500     262 0.5240000 0.6873333           Asian
## 5    2017 Native American  100      43 0.4300000 0.6873333           Asian
## 6    2017           White 3400    2053 0.6038235 0.6873333           Asian
## 7    2018           Asian 3000    2230 0.7433333 0.7433333           Asian
## 8    2018           Black 1000     297 0.2970000 0.7433333           Asian
## 9    2018        Hispanic 2000     437 0.2185000 0.7433333           Asian
## 10   2018 Multi-Ethnicity  500     242 0.4840000 0.7433333           Asian
## 11   2018 Native American  100      35 0.3500000 0.7433333           Asian
## 12   2018           White 3400    2147 0.6314706 0.7433333           Asian
##           moe    pct_lo    pct_hi di_indicator success_needed_not_di
## 1  0.03000000 0.6573333 0.7173333            0                     0
## 2  0.03099032 0.2790097 0.3409903            1                   347
## 3  0.03000000 0.1750000 0.2350000            1                   905
## 4  0.04382693 0.4801731 0.5678269            1                    60
## 5  0.09800000 0.3320000 0.5280000            1                    16
## 6  0.03000000 0.5738235 0.6338235            1                   182
## 7  0.03000000 0.7133333 0.7733333            0                     0
## 8  0.03099032 0.2660097 0.3279903            1                   416
## 9  0.03000000 0.1885000 0.2485000            1                   990
## 10 0.04382693 0.4401731 0.5278269            1                   108
## 11 0.09800000 0.2520000 0.4480000            1                    30
## 12 0.03000000 0.6014706 0.6614706            1                   279
##    success_needed_full_parity
## 1                           0
## 2                         378
## 3                         965
## 4                          82
## 5                          26
## 6                         284
## 7                           0
## 8                         447
## 9                        1050
## 10                        130
## 11                         40
## 12                        381
# Reference: All but current (PPG minus 1)
di_ppg(success=Transfer, group=Ethnicity, cohort=Cohort, reference='all but current', data=student_equity) %>%
  as.data.frame
##    cohort           group    n success       pct reference reference_group
## 1    2017           Asian 3000    2062 0.6873333 0.4397143 all but current
## 2    2017           Black 1000     310 0.3100000 0.5366667 all but current
## 3    2017        Hispanic 2000     410 0.2050000 0.5912500 all but current
## 4    2017 Multi-Ethnicity  500     262 0.5240000 0.5134737 all but current
## 5    2017 Native American  100      43 0.4300000 0.5148485 all but current
## 6    2017           White 3400    2053 0.6038235 0.4677273 all but current
## 7    2018           Asian 3000    2230 0.7433333 0.4511429 all but current
## 8    2018           Black 1000     297 0.2970000 0.5656667 all but current
## 9    2018        Hispanic 2000     437 0.2185000 0.6188750 all but current
## 10   2018 Multi-Ethnicity  500     242 0.4840000 0.5416842 all but current
## 11   2018 Native American  100      35 0.3500000 0.5407071 all but current
## 12   2018           White 3400    2147 0.6314706 0.4910606 all but current
##           moe    pct_lo    pct_hi di_indicator success_needed_not_di
## 1  0.03000000 0.6573333 0.7173333            0                     0
## 2  0.03099032 0.2790097 0.3409903            1                   196
## 3  0.03000000 0.1750000 0.2350000            1                   713
## 4  0.04382693 0.4801731 0.5678269            0                     0
## 5  0.09800000 0.3320000 0.5280000            0                     0
## 6  0.03000000 0.5738235 0.6338235            0                     0
## 7  0.03000000 0.7133333 0.7733333            0                     0
## 8  0.03099032 0.2660097 0.3279903            1                   238
## 9  0.03000000 0.1885000 0.2485000            1                   741
## 10 0.04382693 0.4401731 0.5278269            1                     7
## 11 0.09800000 0.2520000 0.4480000            1                    10
## 12 0.03000000 0.6014706 0.6614706            0                     0
##    success_needed_full_parity
## 1                           0
## 2                         227
## 3                         773
## 4                           0
## 5                           9
## 6                           0
## 7                           0
## 8                         269
## 9                         801
## 10                         29
## 11                         20
## 12                          0
# Reference: custom group
di_ppg(success=Transfer, group=Ethnicity, cohort=Cohort, reference='White', data=student_equity) %>%
  as.data.frame
##    cohort           group    n success       pct reference reference_group
## 1    2017           Asian 3000    2062 0.6873333 0.6038235           White
## 2    2017           Black 1000     310 0.3100000 0.6038235           White
## 3    2017        Hispanic 2000     410 0.2050000 0.6038235           White
## 4    2017 Multi-Ethnicity  500     262 0.5240000 0.6038235           White
## 5    2017 Native American  100      43 0.4300000 0.6038235           White
## 6    2017           White 3400    2053 0.6038235 0.6038235           White
## 7    2018           Asian 3000    2230 0.7433333 0.6314706           White
## 8    2018           Black 1000     297 0.2970000 0.6314706           White
## 9    2018        Hispanic 2000     437 0.2185000 0.6314706           White
## 10   2018 Multi-Ethnicity  500     242 0.4840000 0.6314706           White
## 11   2018 Native American  100      35 0.3500000 0.6314706           White
## 12   2018           White 3400    2147 0.6314706 0.6314706           White
##           moe    pct_lo    pct_hi di_indicator success_needed_not_di
## 1  0.03000000 0.6573333 0.7173333            0                     0
## 2  0.03099032 0.2790097 0.3409903            1                   263
## 3  0.03000000 0.1750000 0.2350000            1                   738
## 4  0.04382693 0.4801731 0.5678269            1                    18
## 5  0.09800000 0.3320000 0.5280000            1                     8
## 6  0.03000000 0.5738235 0.6338235            0                     0
## 7  0.03000000 0.7133333 0.7733333            0                     0
## 8  0.03099032 0.2660097 0.3279903            1                   304
## 9  0.03000000 0.1885000 0.2485000            1                   766
## 10 0.04382693 0.4401731 0.5278269            1                    52
## 11 0.09800000 0.2520000 0.4480000            1                    19
## 12 0.03000000 0.6014706 0.6614706            0                     0
##    success_needed_full_parity
## 1                           0
## 2                         294
## 3                         798
## 4                          40
## 5                          18
## 6                           0
## 7                           0
## 8                         335
## 9                         826
## 10                         74
## 11                         29
## 12                          0
di_ppg(success=Transfer, group=Ethnicity, cohort=Cohort, reference='Asian', data=student_equity) %>%
  as.data.frame
##    cohort           group    n success       pct reference reference_group
## 1    2017           Asian 3000    2062 0.6873333 0.6873333           Asian
## 2    2017           Black 1000     310 0.3100000 0.6873333           Asian
## 3    2017        Hispanic 2000     410 0.2050000 0.6873333           Asian
## 4    2017 Multi-Ethnicity  500     262 0.5240000 0.6873333           Asian
## 5    2017 Native American  100      43 0.4300000 0.6873333           Asian
## 6    2017           White 3400    2053 0.6038235 0.6873333           Asian
## 7    2018           Asian 3000    2230 0.7433333 0.7433333           Asian
## 8    2018           Black 1000     297 0.2970000 0.7433333           Asian
## 9    2018        Hispanic 2000     437 0.2185000 0.7433333           Asian
## 10   2018 Multi-Ethnicity  500     242 0.4840000 0.7433333           Asian
## 11   2018 Native American  100      35 0.3500000 0.7433333           Asian
## 12   2018           White 3400    2147 0.6314706 0.7433333           Asian
##           moe    pct_lo    pct_hi di_indicator success_needed_not_di
## 1  0.03000000 0.6573333 0.7173333            0                     0
## 2  0.03099032 0.2790097 0.3409903            1                   347
## 3  0.03000000 0.1750000 0.2350000            1                   905
## 4  0.04382693 0.4801731 0.5678269            1                    60
## 5  0.09800000 0.3320000 0.5280000            1                    16
## 6  0.03000000 0.5738235 0.6338235            1                   182
## 7  0.03000000 0.7133333 0.7733333            0                     0
## 8  0.03099032 0.2660097 0.3279903            1                   416
## 9  0.03000000 0.1885000 0.2485000            1                   990
## 10 0.04382693 0.4401731 0.5278269            1                   108
## 11 0.09800000 0.2520000 0.4480000            1                    30
## 12 0.03000000 0.6014706 0.6614706            1                   279
##    success_needed_full_parity
## 1                           0
## 2                         378
## 3                         965
## 4                          82
## 5                          26
## 6                         284
## 7                           0
## 8                         447
## 9                        1050
## 10                        130
## 11                         40
## 12                        381

The user could also pass in custom reference points for comparison (e.g., a state-wide rate). di_ppg accepts either a single reference point to be used or a vector of reference points, one for each cohort. For the latter, the vector of reference points will be taken to correspond to the cohort variable, alphabetically ordered.

# With custom reference (single)
di_ppg(success=Transfer, group=Ethnicity, reference=0.54, data=student_equity) %>%
  as.data.frame
##             group    n success       pct reference reference_group        moe
## 1           Asian 6000    4292 0.7153333      0.54         numeric 0.03000000
## 2           Black 2000     607 0.3035000      0.54         numeric 0.03000000
## 3        Hispanic 4000     847 0.2117500      0.54         numeric 0.03000000
## 4 Multi-Ethnicity 1000     504 0.5040000      0.54         numeric 0.03099032
## 5 Native American  200      78 0.3900000      0.54         numeric 0.06929646
## 6           White 6800    4200 0.6176471      0.54         numeric 0.03000000
##      pct_lo    pct_hi di_indicator success_needed_not_di
## 1 0.6853333 0.7453333            0                     0
## 2 0.2735000 0.3335000            1                   414
## 3 0.1817500 0.2417500            1                  1193
## 4 0.4730097 0.5349903            1                     6
## 5 0.3207035 0.4592965            1                    17
## 6 0.5876471 0.6476471            0                     0
##   success_needed_full_parity
## 1                          0
## 2                        474
## 3                       1314
## 4                         37
## 5                         31
## 6                          0
# With custom reference (multiple)
di_ppg(success=Transfer, group=Ethnicity, cohort=Cohort, reference=c(0.5, 0.55), data=student_equity) %>%
  as.data.frame
##    cohort           group    n success       pct reference reference_group
## 1    2017           Asian 3000    2062 0.6873333      0.50         numeric
## 2    2017           Black 1000     310 0.3100000      0.50         numeric
## 3    2017        Hispanic 2000     410 0.2050000      0.50         numeric
## 4    2017 Multi-Ethnicity  500     262 0.5240000      0.50         numeric
## 5    2017 Native American  100      43 0.4300000      0.50         numeric
## 6    2017           White 3400    2053 0.6038235      0.50         numeric
## 7    2018           Asian 3000    2230 0.7433333      0.55         numeric
## 8    2018           Black 1000     297 0.2970000      0.55         numeric
## 9    2018        Hispanic 2000     437 0.2185000      0.55         numeric
## 10   2018 Multi-Ethnicity  500     242 0.4840000      0.55         numeric
## 11   2018 Native American  100      35 0.3500000      0.55         numeric
## 12   2018           White 3400    2147 0.6314706      0.55         numeric
##           moe    pct_lo    pct_hi di_indicator success_needed_not_di
## 1  0.03000000 0.6573333 0.7173333            0                     0
## 2  0.03099032 0.2790097 0.3409903            1                   160
## 3  0.03000000 0.1750000 0.2350000            1                   530
## 4  0.04382693 0.4801731 0.5678269            0                     0
## 5  0.09800000 0.3320000 0.5280000            0                     0
## 6  0.03000000 0.5738235 0.6338235            0                     0
## 7  0.03000000 0.7133333 0.7733333            0                     0
## 8  0.03099032 0.2660097 0.3279903            1                   223
## 9  0.03000000 0.1885000 0.2485000            1                   604
## 10 0.04382693 0.4401731 0.5278269            1                    12
## 11 0.09800000 0.2520000 0.4480000            1                    11
## 12 0.03000000 0.6014706 0.6614706            0                     0
##    success_needed_full_parity
## 1                           0
## 2                         190
## 3                         591
## 4                           0
## 5                           8
## 6                           0
## 7                           0
## 8                         254
## 9                         663
## 10                         34
## 11                         21
## 12                          0

Disproportionate impact using the PPG relies on calculating the margine margin of error (MOE) pertaining around the success rate. The MOE calculated in di_ppg has 2 underlying assumptions (defaults):

  1. the minimum MOE returned is 0.03, and
  2. using 0.50 as the proportion in the margin of error formula, \(1.96 \times \sqrt{\hat{p} (1-\hat{p}) / n}\).

To override 1, the user could specify min_moe in di_ppg. To override 2, the user could specify use_prop_in_moe=TRUE in di_ppg.

# min_moe
di_ppg(success=Transfer, group=Ethnicity, data=student_equity, min_moe=0.02) %>%
  as.data.frame
##             group    n success       pct reference reference_group        moe
## 1           Asian 6000    4292 0.7153333    0.5264         overall 0.02000000
## 2           Black 2000     607 0.3035000    0.5264         overall 0.02191347
## 3        Hispanic 4000     847 0.2117500    0.5264         overall 0.02000000
## 4 Multi-Ethnicity 1000     504 0.5040000    0.5264         overall 0.03099032
## 5 Native American  200      78 0.3900000    0.5264         overall 0.06929646
## 6           White 6800    4200 0.6176471    0.5264         overall 0.02000000
##      pct_lo    pct_hi di_indicator success_needed_not_di
## 1 0.6953333 0.7353333            0                     0
## 2 0.2815865 0.3254135            1                   402
## 3 0.1917500 0.2317500            1                  1179
## 4 0.4730097 0.5349903            0                     0
## 5 0.3207035 0.4592965            1                    14
## 6 0.5976471 0.6376471            0                     0
##   success_needed_full_parity
## 1                          0
## 2                        446
## 3                       1259
## 4                         23
## 5                         28
## 6                          0
# use_prop_in_moe
di_ppg(success=Transfer, group=Ethnicity, data=student_equity, min_moe=0.02, use_prop_in_moe=TRUE) %>%
  as.data.frame
##             group    n success       pct reference reference_group        moe
## 1           Asian 6000    4292 0.7153333    0.5264         overall 0.02000000
## 2           Black 2000     607 0.3035000    0.5264         overall 0.02015028
## 3        Hispanic 4000     847 0.2117500    0.5264         overall 0.02000000
## 4 Multi-Ethnicity 1000     504 0.5040000    0.5264         overall 0.03098933
## 5 Native American  200      78 0.3900000    0.5264         overall 0.06759869
## 6           White 6800    4200 0.6176471    0.5264         overall 0.02000000
##      pct_lo    pct_hi di_indicator success_needed_not_di
## 1 0.6953333 0.7353333            0                     0
## 2 0.2833497 0.3236503            1                   406
## 3 0.1917500 0.2317500            1                  1179
## 4 0.4730107 0.5349893            0                     0
## 5 0.3224013 0.4575987            1                    14
## 6 0.5976471 0.6376471            0                     0
##   success_needed_full_parity
## 1                          0
## 2                        446
## 3                       1259
## 4                         23
## 5                         28
## 6                          0

In cases where the proportion is used in calculating MOE, an observed proportion of 0 or 1 would lead to a zero MOE. To account for these scenarios, the user could leverage the prop_sub_0 and prop_sub_1 parameters in di_ppg and ppg_moe as substitutes. These parameters default to 0.5, which maximizes the MOE (making it more difficult to declare disproportionate impact).

# Set Native American to have have zero transfers and see what the results
di_ppg(success=Transfer, group=Ethnicity, data=student_equity %>% mutate(Transfer=ifelse(Ethnicity=='Native American', 0, Transfer)), use_prop_in_moe=TRUE, prop_sub_0=0.1, prop_sub_1=0.9) %>%
  as.data.frame
## Warning in ppg_moe(n = n, proportion = pct, min_moe = min_moe, prop_sub_0 =
## prop_sub_0, : The vector `proportion` contains 0. This will lead to a zero MOE.
## `prop_sub_0=0.1` will be used in calculating the MOE for these cases.
##             group    n success       pct reference reference_group        moe
## 1           Asian 6000    4292 0.7153333    0.5225         overall 0.03000000
## 2           Black 2000     607 0.3035000    0.5225         overall 0.03000000
## 3        Hispanic 4000     847 0.2117500    0.5225         overall 0.03000000
## 4 Multi-Ethnicity 1000     504 0.5040000    0.5225         overall 0.03098933
## 5 Native American  200       0 0.0000000    0.5225         overall 0.04157788
## 6           White 6800    4200 0.6176471    0.5225         overall 0.03000000
##        pct_lo     pct_hi di_indicator success_needed_not_di
## 1  0.68533333 0.74533333            0                     0
## 2  0.27350000 0.33350000            1                   378
## 3  0.18175000 0.24175000            1                  1123
## 4  0.47301067 0.53498933            0                     0
## 5 -0.04157788 0.04157788            1                    97
## 6  0.58764706 0.64764706            0                     0
##   success_needed_full_parity
## 1                          0
## 2                        438
## 3                       1243
## 4                         19
## 5                        105
## 6                          0

Proportionality index method

di_prop_index is the main work function for this method, and it can take on vectors or column names the tidy way:

# Without cohort
## Vector
di_prop_index(success=student_equity$Transfer, group=student_equity$Ethnicity) %>% as.data.frame
##             group    n success pct_success pct_group di_prop_index di_indicator
## 1           Asian 6000    4292 0.407674772      0.30     1.3589159            0
## 2           Black 2000     607 0.057655775      0.10     0.5765578            1
## 3        Hispanic 4000     847 0.080452128      0.20     0.4022606            1
## 4 Multi-Ethnicity 1000     504 0.047872340      0.05     0.9574468            0
## 5 Native American  200      78 0.007408815      0.01     0.7408815            1
## 6           White 6800    4200 0.398936170      0.34     1.1733417            0
##   success_needed_not_di success_needed_full_parity
## 1                     0                          0
## 2                   256                        496
## 3                   998                       1574
## 4                     0                         24
## 5                     7                         28
## 6                     0                          0
## Tidy and column reference
di_prop_index(success=Transfer, group=Ethnicity, data=student_equity) %>%
  as.data.frame
##             group    n success pct_success pct_group di_prop_index di_indicator
## 1           Asian 6000    4292 0.407674772      0.30     1.3589159            0
## 2           Black 2000     607 0.057655775      0.10     0.5765578            1
## 3        Hispanic 4000     847 0.080452128      0.20     0.4022606            1
## 4 Multi-Ethnicity 1000     504 0.047872340      0.05     0.9574468            0
## 5 Native American  200      78 0.007408815      0.01     0.7408815            1
## 6           White 6800    4200 0.398936170      0.34     1.1733417            0
##   success_needed_not_di success_needed_full_parity
## 1                     0                          0
## 2                   256                        496
## 3                   998                       1574
## 4                     0                         24
## 5                     7                         28
## 6                     0                          0
# With cohort
## Vector
di_prop_index(success=student_equity$Transfer, group=student_equity$Ethnicity, cohort=student_equity$Cohort) %>% as.data.frame
##    cohort           group    n success pct_success pct_group di_prop_index
## 1    2017           Asian 3000    2062 0.401167315      0.30     1.3372244
## 2    2017           Black 1000     310 0.060311284      0.10     0.6031128
## 3    2017        Hispanic 2000     410 0.079766537      0.20     0.3988327
## 4    2017 Multi-Ethnicity  500     262 0.050972763      0.05     1.0194553
## 5    2017 Native American  100      43 0.008365759      0.01     0.8365759
## 6    2017           White 3400    2053 0.399416342      0.34     1.1747539
## 7    2018           Asian 3000    2230 0.413882702      0.30     1.3796090
## 8    2018           Black 1000     297 0.055122494      0.10     0.5512249
## 9    2018        Hispanic 2000     437 0.081106162      0.20     0.4055308
## 10   2018 Multi-Ethnicity  500     242 0.044914625      0.05     0.8982925
## 11   2018 Native American  100      35 0.006495917      0.01     0.6495917
## 12   2018           White 3400    2147 0.398478099      0.34     1.1719944
##    di_indicator success_needed_not_di success_needed_full_parity
## 1             0                     0                          0
## 2             1                   111                        227
## 3             1                   491                        773
## 4             0                     0                          0
## 5             0                     0                          9
## 6             0                     0                          0
## 7             0                     0                          0
## 8             1                   146                        269
## 9             1                   507                        801
## 10            0                     0                         29
## 11            1                     9                         20
## 12            0                     0                          0
## Tidy and column reference
di_prop_index(success=Transfer, group=Ethnicity, cohort=Cohort, data=student_equity) %>%
  as.data.frame
##    cohort           group    n success pct_success pct_group di_prop_index
## 1    2017           Asian 3000    2062 0.401167315      0.30     1.3372244
## 2    2017           Black 1000     310 0.060311284      0.10     0.6031128
## 3    2017        Hispanic 2000     410 0.079766537      0.20     0.3988327
## 4    2017 Multi-Ethnicity  500     262 0.050972763      0.05     1.0194553
## 5    2017 Native American  100      43 0.008365759      0.01     0.8365759
## 6    2017           White 3400    2053 0.399416342      0.34     1.1747539
## 7    2018           Asian 3000    2230 0.413882702      0.30     1.3796090
## 8    2018           Black 1000     297 0.055122494      0.10     0.5512249
## 9    2018        Hispanic 2000     437 0.081106162      0.20     0.4055308
## 10   2018 Multi-Ethnicity  500     242 0.044914625      0.05     0.8982925
## 11   2018 Native American  100      35 0.006495917      0.01     0.6495917
## 12   2018           White 3400    2147 0.398478099      0.34     1.1719944
##    di_indicator success_needed_not_di success_needed_full_parity
## 1             0                     0                          0
## 2             1                   111                        227
## 3             1                   491                        773
## 4             0                     0                          0
## 5             0                     0                          9
## 6             0                     0                          0
## 7             0                     0                          0
## 8             1                   146                        269
## 9             1                   507                        801
## 10            0                     0                         29
## 11            1                     9                         20
## 12            0                     0                          0

For a description of the di_prop_index function, including both function arguments and returned results, type ?di_prop_index in the R console.

Note that the referenced document describing this method does not recommend a threshold on the proportionality index for declaring disproportionate impact. The di_prop_index function uses di_prop_index_cutoff=0.8 as the default threshold, which the user could change.

# Changing threshold for DI
di_prop_index(success=student_equity$Transfer, group=student_equity$Ethnicity, cohort=student_equity$Cohort, di_prop_index_cutoff=0.5) %>% as.data.frame
##    cohort           group    n success pct_success pct_group di_prop_index
## 1    2017           Asian 3000    2062 0.401167315      0.30     1.3372244
## 2    2017           Black 1000     310 0.060311284      0.10     0.6031128
## 3    2017        Hispanic 2000     410 0.079766537      0.20     0.3988327
## 4    2017 Multi-Ethnicity  500     262 0.050972763      0.05     1.0194553
## 5    2017 Native American  100      43 0.008365759      0.01     0.8365759
## 6    2017           White 3400    2053 0.399416342      0.34     1.1747539
## 7    2018           Asian 3000    2230 0.413882702      0.30     1.3796090
## 8    2018           Black 1000     297 0.055122494      0.10     0.5512249
## 9    2018        Hispanic 2000     437 0.081106162      0.20     0.4055308
## 10   2018 Multi-Ethnicity  500     242 0.044914625      0.05     0.8982925
## 11   2018 Native American  100      35 0.006495917      0.01     0.6495917
## 12   2018           White 3400    2147 0.398478099      0.34     1.1719944
##    di_indicator success_needed_not_di success_needed_full_parity
## 1             0                     0                          0
## 2             0                     0                        227
## 3             1                   116                        773
## 4             0                     0                          0
## 5             0                     0                          9
## 6             0                     0                          0
## 7             0                     0                          0
## 8             0                     0                        269
## 9             1                   114                        801
## 10            0                     0                         29
## 11            0                     0                         20
## 12            0                     0                          0

80% index method

di_80_index is the main work function for this method, and it can take on vectors or column names the tidy way:

# Without cohort
## Vector
di_80_index(success=student_equity$Transfer, group=student_equity$Ethnicity) %>% as.data.frame
##             group    n success       pct reference reference_group di_80_index
## 1           Asian 6000    4292 0.7153333 0.7153333           Asian   1.0000000
## 2           Black 2000     607 0.3035000 0.7153333           Asian   0.4242777
## 3        Hispanic 4000     847 0.2117500 0.7153333           Asian   0.2960158
## 4 Multi-Ethnicity 1000     504 0.5040000 0.7153333           Asian   0.7045666
## 5 Native American  200      78 0.3900000 0.7153333           Asian   0.5452004
## 6           White 6800    4200 0.6176471 0.7153333           Asian   0.8634395
##   di_indicator success_needed_not_di success_needed_full_parity
## 1            0                     0                          0
## 2            1                   538                        824
## 3            1                  1443                       2015
## 4            1                    69                        212
## 5            1                    37                         66
## 6            0                     0                        665
## Tidy and column reference
di_80_index(success=Transfer, group=Ethnicity, data=student_equity) %>%
  as.data.frame
##             group    n success       pct reference reference_group di_80_index
## 1           Asian 6000    4292 0.7153333 0.7153333           Asian   1.0000000
## 2           Black 2000     607 0.3035000 0.7153333           Asian   0.4242777
## 3        Hispanic 4000     847 0.2117500 0.7153333           Asian   0.2960158
## 4 Multi-Ethnicity 1000     504 0.5040000 0.7153333           Asian   0.7045666
## 5 Native American  200      78 0.3900000 0.7153333           Asian   0.5452004
## 6           White 6800    4200 0.6176471 0.7153333           Asian   0.8634395
##   di_indicator success_needed_not_di success_needed_full_parity
## 1            0                     0                          0
## 2            1                   538                        824
## 3            1                  1443                       2015
## 4            1                    69                        212
## 5            1                    37                         66
## 6            0                     0                        665
# With cohort
## Vector
di_80_index(success=student_equity$Transfer, group=student_equity$Ethnicity, cohort=student_equity$Cohort) %>% as.data.frame
##    cohort           group    n success       pct reference reference_group
## 1    2017           Asian 3000    2062 0.6873333 0.6873333           Asian
## 2    2017           Black 1000     310 0.3100000 0.6873333           Asian
## 3    2017        Hispanic 2000     410 0.2050000 0.6873333           Asian
## 4    2017 Multi-Ethnicity  500     262 0.5240000 0.6873333           Asian
## 5    2017 Native American  100      43 0.4300000 0.6873333           Asian
## 6    2017           White 3400    2053 0.6038235 0.6873333           Asian
## 7    2018           Asian 3000    2230 0.7433333 0.7433333           Asian
## 8    2018           Black 1000     297 0.2970000 0.7433333           Asian
## 9    2018        Hispanic 2000     437 0.2185000 0.7433333           Asian
## 10   2018 Multi-Ethnicity  500     242 0.4840000 0.7433333           Asian
## 11   2018 Native American  100      35 0.3500000 0.7433333           Asian
## 12   2018           White 3400    2147 0.6314706 0.7433333           Asian
##    di_80_index di_indicator success_needed_not_di success_needed_full_parity
## 1    1.0000000            0                     0                          0
## 2    0.4510184            1                   240                        378
## 3    0.2982541            1                   690                        965
## 4    0.7623666            1                    13                         82
## 5    0.6256062            1                    12                         26
## 6    0.8785017            0                     0                        284
## 7    1.0000000            0                     0                          0
## 8    0.3995516            1                   298                        447
## 9    0.2939462            1                   753                       1050
## 10   0.6511211            1                    56                        130
## 11   0.4708520            1                    25                         40
## 12   0.8495120            0                     0                        381
## Tidy and column reference
di_80_index(success=Transfer, group=Ethnicity, cohort=Cohort, data=student_equity) %>%
  as.data.frame
##    cohort           group    n success       pct reference reference_group
## 1    2017           Asian 3000    2062 0.6873333 0.6873333           Asian
## 2    2017           Black 1000     310 0.3100000 0.6873333           Asian
## 3    2017        Hispanic 2000     410 0.2050000 0.6873333           Asian
## 4    2017 Multi-Ethnicity  500     262 0.5240000 0.6873333           Asian
## 5    2017 Native American  100      43 0.4300000 0.6873333           Asian
## 6    2017           White 3400    2053 0.6038235 0.6873333           Asian
## 7    2018           Asian 3000    2230 0.7433333 0.7433333           Asian
## 8    2018           Black 1000     297 0.2970000 0.7433333           Asian
## 9    2018        Hispanic 2000     437 0.2185000 0.7433333           Asian
## 10   2018 Multi-Ethnicity  500     242 0.4840000 0.7433333           Asian
## 11   2018 Native American  100      35 0.3500000 0.7433333           Asian
## 12   2018           White 3400    2147 0.6314706 0.7433333           Asian
##    di_80_index di_indicator success_needed_not_di success_needed_full_parity
## 1    1.0000000            0                     0                          0
## 2    0.4510184            1                   240                        378
## 3    0.2982541            1                   690                        965
## 4    0.7623666            1                    13                         82
## 5    0.6256062            1                    12                         26
## 6    0.8785017            0                     0                        284
## 7    1.0000000            0                     0                          0
## 8    0.3995516            1                   298                        447
## 9    0.2939462            1                   753                       1050
## 10   0.6511211            1                    56                        130
## 11   0.4708520            1                    25                         40
## 12   0.8495120            0                     0                        381

For a description of the di_80_index function, including both function arguments and returned results, type ?di_80_index in the R console.

By default, di_80_index uses the group with the highest success rate as reference in calculating the index. One could specify the the comparison group using the reference_group argument (a value from group).

# Changing reference group
di_80_index(success=student_equity$Transfer, group=student_equity$Ethnicity, cohort=student_equity$Cohort, reference_group='White') %>% as.data.frame
##    cohort           group    n success       pct reference reference_group
## 1    2017           Asian 3000    2062 0.6873333 0.6038235           White
## 2    2017           Black 1000     310 0.3100000 0.6038235           White
## 3    2017        Hispanic 2000     410 0.2050000 0.6038235           White
## 4    2017 Multi-Ethnicity  500     262 0.5240000 0.6038235           White
## 5    2017 Native American  100      43 0.4300000 0.6038235           White
## 6    2017           White 3400    2053 0.6038235 0.6038235           White
## 7    2018           Asian 3000    2230 0.7433333 0.6314706           White
## 8    2018           Black 1000     297 0.2970000 0.6314706           White
## 9    2018        Hispanic 2000     437 0.2185000 0.6314706           White
## 10   2018 Multi-Ethnicity  500     242 0.4840000 0.6314706           White
## 11   2018 Native American  100      35 0.3500000 0.6314706           White
## 12   2018           White 3400    2147 0.6314706 0.6314706           White
##    di_80_index di_indicator success_needed_not_di success_needed_full_parity
## 1    1.1383017            0                     0                          0
## 2    0.5133950            1                   174                        294
## 3    0.3395032            1                   557                        798
## 4    0.8678032            0                     0                         40
## 5    0.7121286            1                     6                         18
## 6    1.0000000            0                     0                          0
## 7    1.1771464            0                     0                          0
## 8    0.4703307            1                   209                        335
## 9    0.3460177            1                   574                        826
## 10   0.7664648            1                    11                         74
## 11   0.5542618            1                    16                         29
## 12   1.0000000            0                     0                          0

By default, di_80_index uses 80% (di_80_index_cutoff=0.80) as the default threshold for declaring disproportionate impact. One could override this using another threshold via the di_80_index_cutoff argument.

# Changing threshold for DI
di_80_index(success=student_equity$Transfer, group=student_equity$Ethnicity, cohort=student_equity$Cohort, di_80_index_cutoff=0.50) %>% as.data.frame
##    cohort           group    n success       pct reference reference_group
## 1    2017           Asian 3000    2062 0.6873333 0.6873333           Asian
## 2    2017           Black 1000     310 0.3100000 0.6873333           Asian
## 3    2017        Hispanic 2000     410 0.2050000 0.6873333           Asian
## 4    2017 Multi-Ethnicity  500     262 0.5240000 0.6873333           Asian
## 5    2017 Native American  100      43 0.4300000 0.6873333           Asian
## 6    2017           White 3400    2053 0.6038235 0.6873333           Asian
## 7    2018           Asian 3000    2230 0.7433333 0.7433333           Asian
## 8    2018           Black 1000     297 0.2970000 0.7433333           Asian
## 9    2018        Hispanic 2000     437 0.2185000 0.7433333           Asian
## 10   2018 Multi-Ethnicity  500     242 0.4840000 0.7433333           Asian
## 11   2018 Native American  100      35 0.3500000 0.7433333           Asian
## 12   2018           White 3400    2147 0.6314706 0.7433333           Asian
##    di_80_index di_indicator success_needed_not_di success_needed_full_parity
## 1    1.0000000            0                     0                          0
## 2    0.4510184            1                    34                        378
## 3    0.2982541            1                   278                        965
## 4    0.7623666            0                     0                         82
## 5    0.6256062            0                     0                         26
## 6    0.8785017            0                     0                        284
## 7    1.0000000            0                     0                          0
## 8    0.3995516            1                    75                        447
## 9    0.2939462            1                   307                       1050
## 10   0.6511211            0                     0                        130
## 11   0.4708520            1                     3                         40
## 12   0.8495120            0                     0                        381

When dealing with a non-success variable like drop-out or probation

All methods and functions implemented in the DisImpact package treat outcomes as positive: 1 is desired over 0 (higher rate is better, lower rate indicates disparity). The choice of the name success in the functions’ arguments is intentional to remind the user of this.

Suppose we have a variable that indicates something negative (e.g., a flag for students on academic probation). We could calculate DI on the converse of it by using the ! (logical negation) operator:

## di_ppg(success=!Probation, group=Ethnicity, data=student_equity) %>%
##   as.data.frame ## If there were a Probation variable
di_ppg(success=!Transfer, group=Ethnicity, data=student_equity) %>%
  as.data.frame ## Illustrating the point with `!`
##             group    n success       pct reference reference_group        moe
## 1           Asian 6000    1708 0.2846667    0.4736         overall 0.03000000
## 2           Black 2000    1393 0.6965000    0.4736         overall 0.03000000
## 3        Hispanic 4000    3153 0.7882500    0.4736         overall 0.03000000
## 4 Multi-Ethnicity 1000     496 0.4960000    0.4736         overall 0.03099032
## 5 Native American  200     122 0.6100000    0.4736         overall 0.06929646
## 6           White 6800    2600 0.3823529    0.4736         overall 0.03000000
##      pct_lo    pct_hi di_indicator success_needed_not_di
## 1 0.2546667 0.3146667            1                   954
## 2 0.6665000 0.7265000            0                     0
## 3 0.7582500 0.8182500            0                     0
## 4 0.4650097 0.5269903            0                     0
## 5 0.5407035 0.6792965            0                     0
## 6 0.3523529 0.4123529            1                   417
##   success_needed_full_parity
## 1                       1134
## 2                          0
## 3                          0
## 4                          0
## 5                          0
## 6                        621

Transformations on the fly

We can compute the success, group, and cohort variables on the fly:

# Transform success
a <- sample(0:1, size=nrow(student_equity), replace=TRUE, prob=c(0.95, 0.05))
mean(a)
## [1] 0.05065
di_ppg(success=pmax(Transfer, a), group=Ethnicity, data=student_equity) %>%
  as.data.frame
##             group    n success       pct reference reference_group        moe
## 1           Asian 6000    4379 0.7298333    0.5504         overall 0.03000000
## 2           Black 2000     683 0.3415000    0.5504         overall 0.03000000
## 3        Hispanic 4000    1002 0.2505000    0.5504         overall 0.03000000
## 4 Multi-Ethnicity 1000     533 0.5330000    0.5504         overall 0.03099032
## 5 Native American  200      86 0.4300000    0.5504         overall 0.06929646
## 6           White 6800    4325 0.6360294    0.5504         overall 0.03000000
##      pct_lo    pct_hi di_indicator success_needed_not_di
## 1 0.6998333 0.7598333            0                     0
## 2 0.3115000 0.3715000            1                   358
## 3 0.2205000 0.2805000            1                  1080
## 4 0.5020097 0.5639903            0                     0
## 5 0.3607035 0.4992965            1                    11
## 6 0.6060294 0.6660294            0                     0
##   success_needed_full_parity
## 1                          0
## 2                        418
## 3                       1200
## 4                         18
## 5                         25
## 6                          0
# Collapse Black and Hispanic
di_ppg(success=Transfer, group=ifelse(Ethnicity %in% c('Black', 'Hispanic'), 'Black/Hispanic', Ethnicity), data=student_equity) %>% as.data.frame
##             group    n success       pct reference reference_group        moe
## 1           Asian 6000    4292 0.7153333    0.5264         overall 0.03000000
## 2  Black/Hispanic 6000    1454 0.2423333    0.5264         overall 0.03000000
## 3 Multi-Ethnicity 1000     504 0.5040000    0.5264         overall 0.03099032
## 4 Native American  200      78 0.3900000    0.5264         overall 0.06929646
## 5           White 6800    4200 0.6176471    0.5264         overall 0.03000000
##      pct_lo    pct_hi di_indicator success_needed_not_di
## 1 0.6853333 0.7453333            0                     0
## 2 0.2123333 0.2723333            1                  1525
## 3 0.4730097 0.5349903            0                     0
## 4 0.3207035 0.4592965            1                    14
## 5 0.5876471 0.6476471            0                     0
##   success_needed_full_parity
## 1                          0
## 2                       1705
## 3                         23
## 4                         28
## 5                          0

Calculate DI for many variables and groups

It is often the case that the user desires to calculate disproportionate impact across many outcome variables and many disaggregation/group variables. The function di_iterate allows the user to specify a data set and the various variables to iterate across:

# Multiple group variables
di_iterate(data=student_equity, success_vars=c('Transfer'), group_vars=c('Ethnicity', 'Gender'), cohort_vars=c('Cohort'), ppg_reference_groups='overall') %>% as.data.frame
##    success_variable cohort_variable cohort disaggregation           group     n
## 1          Transfer          Cohort   2017         - None           - All 10000
## 2          Transfer          Cohort   2017      Ethnicity           Asian  3000
## 3          Transfer          Cohort   2017      Ethnicity           Black  1000
## 4          Transfer          Cohort   2017      Ethnicity        Hispanic  2000
## 5          Transfer          Cohort   2017      Ethnicity Multi-Ethnicity   500
## 6          Transfer          Cohort   2017      Ethnicity Native American   100
## 7          Transfer          Cohort   2017      Ethnicity           White  3400
## 8          Transfer          Cohort   2017         Gender          Female  4930
## 9          Transfer          Cohort   2017         Gender            Male  4886
## 10         Transfer          Cohort   2017         Gender           Other   184
## 11         Transfer          Cohort   2018         - None           - All 10000
## 12         Transfer          Cohort   2018      Ethnicity           Asian  3000
## 13         Transfer          Cohort   2018      Ethnicity           Black  1000
## 14         Transfer          Cohort   2018      Ethnicity        Hispanic  2000
## 15         Transfer          Cohort   2018      Ethnicity Multi-Ethnicity   500
## 16         Transfer          Cohort   2018      Ethnicity Native American   100
## 17         Transfer          Cohort   2018      Ethnicity           White  3400
## 18         Transfer          Cohort   2018         Gender          Female  4928
## 19         Transfer          Cohort   2018         Gender            Male  4880
## 20         Transfer          Cohort   2018         Gender           Other   192
##    success       pct ppg_reference ppg_reference_group        moe    pct_lo
## 1     5140 0.5140000        0.5140             overall 0.03000000 0.4840000
## 2     2062 0.6873333        0.5140             overall 0.03000000 0.6573333
## 3      310 0.3100000        0.5140             overall 0.03099032 0.2790097
## 4      410 0.2050000        0.5140             overall 0.03000000 0.1750000
## 5      262 0.5240000        0.5140             overall 0.04382693 0.4801731
## 6       43 0.4300000        0.5140             overall 0.09800000 0.3320000
## 7     2053 0.6038235        0.5140             overall 0.03000000 0.5738235
## 8     2513 0.5097363        0.5140             overall 0.03000000 0.4797363
## 9     2548 0.5214900        0.5140             overall 0.03000000 0.4914900
## 10      79 0.4293478        0.5140             overall 0.07224656 0.3571013
## 11    5388 0.5388000        0.5388             overall 0.03000000 0.5088000
## 12    2230 0.7433333        0.5388             overall 0.03000000 0.7133333
## 13     297 0.2970000        0.5388             overall 0.03099032 0.2660097
## 14     437 0.2185000        0.5388             overall 0.03000000 0.1885000
## 15     242 0.4840000        0.5388             overall 0.04382693 0.4401731
## 16      35 0.3500000        0.5388             overall 0.09800000 0.2520000
## 17    2147 0.6314706        0.5388             overall 0.03000000 0.6014706
## 18    2638 0.5353084        0.5388             overall 0.03000000 0.5053084
## 19    2642 0.5413934        0.5388             overall 0.03000000 0.5113934
## 20     108 0.5625000        0.5388             overall 0.07072541 0.4917746
##       pct_hi di_indicator_ppg success_needed_not_di_ppg
## 1  0.5440000                0                         0
## 2  0.7173333                0                         0
## 3  0.3409903                1                       174
## 4  0.2350000                1                       558
## 5  0.5678269                0                         0
## 6  0.5280000                0                         0
## 7  0.6338235                0                         0
## 8  0.5397363                0                         0
## 9  0.5514900                0                         0
## 10 0.5015944                1                         3
## 11 0.5688000                0                         0
## 12 0.7733333                0                         0
## 13 0.3279903                1                       211
## 14 0.2485000                1                       581
## 15 0.5278269                1                         6
## 16 0.4480000                1                        10
## 17 0.6614706                0                         0
## 18 0.5653084                0                         0
## 19 0.5713934                0                         0
## 20 0.6332254                0                         0
##    success_needed_full_parity_ppg di_prop_index di_indicator_prop_index
## 1                               0     1.0000000                       0
## 2                               0     1.3372244                       0
## 3                             205     0.6031128                       1
## 4                             619     0.3988327                       1
## 5                               0     1.0194553                       0
## 6                               9     0.8365759                       0
## 7                               0     1.1747539                       0
## 8                              22     0.9917049                       0
## 9                               0     1.0145719                       0
## 10                             16     0.8353071                       0
## 11                              0     1.0000000                       0
## 12                              0     1.3796090                       0
## 13                            242     0.5512249                       1
## 14                            641     0.4055308                       1
## 15                             28     0.8982925                       0
## 16                             19     0.6495917                       1
## 17                              0     1.1719944                       0
## 18                             18     0.9935198                       0
## 19                              0     1.0048134                       0
## 20                              0     1.0439866                       0
##    success_needed_not_di_prop_index success_needed_full_parity_prop_index
## 1                                 0                                     0
## 2                                 0                                     0
## 3                               111                                   227
## 4                               491                                   773
## 5                                 0                                     0
## 6                                 0                                     9
## 7                                 0                                     0
## 8                                 0                                    42
## 9                                 0                                     0
## 10                                0                                    16
## 11                                0                                     0
## 12                                0                                     0
## 13                              146                                   269
## 14                              507                                   801
## 15                                0                                    29
## 16                                9                                    20
## 17                                0                                     0
## 18                                0                                    34
## 19                                0                                     0
## 20                                0                                     0
##    di_80_index_reference_group di_80_index di_indicator_80_index
## 1                        - All   1.0000000                     0
## 2                        Asian   1.0000000                     0
## 3                        Asian   0.4510184                     1
## 4                        Asian   0.2982541                     1
## 5                        Asian   0.7623666                     1
## 6                        Asian   0.6256062                     1
## 7                        Asian   0.8785017                     0
## 8                         Male   0.9774614                     0
## 9                         Male   1.0000000                     0
## 10                        Male   0.8233098                     0
## 11                       - All   1.0000000                     0
## 12                       Asian   1.0000000                     0
## 13                       Asian   0.3995516                     1
## 14                       Asian   0.2939462                     1
## 15                       Asian   0.6511211                     1
## 16                       Asian   0.4708520                     1
## 17                       Asian   0.8495120                     0
## 18                       Other   0.9516595                     0
## 19                       Other   0.9624772                     0
## 20                       Other   1.0000000                     0
##    success_needed_not_di_80_index success_needed_full_parity_80_index
## 1                               0                                   0
## 2                               0                                   0
## 3                             240                                 378
## 4                             690                                 965
## 5                              13                                  82
## 6                              12                                  26
## 7                               0                                 284
## 8                               0                                  58
## 9                               0                                   0
## 10                              0                                  17
## 11                              0                                   0
## 12                              0                                   0
## 13                            298                                 447
## 14                            753                                1050
## 15                             56                                 130
## 16                             25                                  40
## 17                              0                                 381
## 18                              0                                 134
## 19                              0                                 103
## 20                              0                                   0
# Multiple group variables and different reference groups

bind_rows(
  di_iterate(data=student_equity, success_vars=c('Transfer'), group_vars=c('Ethnicity', 'Gender'), cohort_vars=c('Cohort'), ppg_reference_groups='overall')
  , di_iterate(data=student_equity, success_vars=c('Transfer'), group_vars=c('Ethnicity', 'Gender'), cohort_vars=c('Cohort'), ppg_reference_groups=c('White', 'Male'), include_non_disagg_results=FALSE) # include_non_disagg_results = FALSE: Already have this scenario in Overall run
)
## # A tibble: 38 x 25
##    success_variable cohort_variable cohort disaggregation group        n success
##    <chr>            <chr>            <int> <chr>          <chr>    <dbl>   <int>
##  1 Transfer         Cohort            2017 - None         - All    10000    5140
##  2 Transfer         Cohort            2017 Ethnicity      Asian     3000    2062
##  3 Transfer         Cohort            2017 Ethnicity      Black     1000     310
##  4 Transfer         Cohort            2017 Ethnicity      Hispanic  2000     410
##  5 Transfer         Cohort            2017 Ethnicity      Multi-E~   500     262
##  6 Transfer         Cohort            2017 Ethnicity      Native ~   100      43
##  7 Transfer         Cohort            2017 Ethnicity      White     3400    2053
##  8 Transfer         Cohort            2017 Gender         Female    4930    2513
##  9 Transfer         Cohort            2017 Gender         Male      4886    2548
## 10 Transfer         Cohort            2017 Gender         Other      184      79
## # ... with 28 more rows, and 18 more variables: pct <dbl>, ppg_reference <dbl>,
## #   ppg_reference_group <chr>, moe <dbl>, pct_lo <dbl>, pct_hi <dbl>,
## #   di_indicator_ppg <dbl>, success_needed_not_di_ppg <dbl>,
## #   success_needed_full_parity_ppg <dbl>, di_prop_index <dbl>,
## #   di_indicator_prop_index <dbl>, success_needed_not_di_prop_index <dbl>,
## #   success_needed_full_parity_prop_index <dbl>,
## #   di_80_index_reference_group <chr>, di_80_index <dbl>, ...

There is a separate vignette that explains how one might leverage di_iterate for rapid dashboard development and deployment with disaggregation and disproportionate impact features.

Appendix: R and R Package Versions

This vignette was generated using an R session with the following packages. There may be some discrepancies when the reader replicates the code caused by version mismatch.

sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19044)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=C                          
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] forcats_0.5.0    scales_1.1.1     ggplot2_3.3.2    stringr_1.4.0   
## [5] knitr_1.39       dplyr_1.0.8      DisImpact_0.0.21
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.8.3      highr_0.9         pillar_1.7.0      bslib_0.3.1      
##  [5] compiler_4.0.2    jquerylib_0.1.4   sets_1.0-21       prettydoc_0.4.1  
##  [9] tools_4.0.2       digest_0.6.25     gtable_0.3.0      jsonlite_1.7.0   
## [13] evaluate_0.15     lifecycle_1.0.1   tibble_3.1.6      fstcore_0.9.12   
## [17] pkgconfig_2.0.3   rlang_1.0.1       DBI_1.1.0         cli_3.2.0        
## [21] parallel_4.0.2    yaml_2.3.5        xfun_0.30         fastmap_1.1.0    
## [25] withr_2.5.0       duckdb_0.5.0      generics_0.1.2    vctrs_0.3.8      
## [29] sass_0.4.1        grid_4.0.2        tidyselect_1.1.2  data.table_1.14.3
## [33] glue_1.6.1        R6_2.3.0          fansi_1.0.2       rmarkdown_2.14   
## [37] farver_2.0.3      tidyr_1.2.0       purrr_0.3.4       blob_1.2.1       
## [41] magrittr_2.0.2    htmltools_0.5.2   ellipsis_0.3.2    fst_0.9.8        
## [45] assertthat_0.2.1  colorspace_1.4-1  collapse_1.8.8    labeling_0.3     
## [49] utf8_1.2.2        stringi_1.4.6     munsell_0.5.0     crayon_1.5.0

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.