Testing for DIF with mstDIF

The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Rudolf Debelak

2024-08-30

Introduction

mstDIF provides a collection of methods for the detection of differential item functioning (DIF) in multistage tests using an item response theory framework. It contains three types of methods. The first type is based on logistic regression, the second type is based on the mstSIB method, and the third type consists of a family of score-based DIF tests. In this brief tutorial, we illustrate the application of each method.

The first steps

After the mstDIF package has been installed, we load it by the following command.

library(mstDIF)

To illustrate the functions of this package, we use an artificial dataset that is part of mstDIF. This dataset consists of 1000 respondents that responded to a multistage test. This test used a (1,2,2) design: All test takers first worked on a module of 7 items. Based on their estimated ability parameter after completing this module, they worked on a easier or a more difficult module next. After this second module, their ability parameter was estimated again and they were either an easy or a difficult module. We load this toy example using the following code:

data("toydata")

This dataset is a list with seven elements. We will use six of them:

resp <- toydata$resp
group_categ <- toydata$group_categ
group_cont <- toydata$group_cont
it <- toydata$it
theta_est <- toydata$theta_est
see_est <- toydata$see_est

The matrix resp contains the response matrix, with 0 corresponding to incorrect and 1 corresponding to correct responses. Missing responses are denoted by NA. group_categ is a vector that indicates an artificial person covariate. 0 indicates that a respondent is a member of the reference group, and 1 that a respondent is a member of the focal group. group_cont is a continuous person covariate, which takes on integer values between 20 and 60; this variable aims at simulating an age variable. it contains a matrix with the item parameters, where the first column corresponds to the discrimination parameters and the second column to the difficulty parameters of the 35 items used in this test. theta_est and see_est are the estimated ability parameters and their standard errors for the individual test takers, respectively.

We want to check whether the item parameters are stable between the focal and reference groups. We use the various methods of mstDIF for this purpose. We are now ready to apply our first method in the next section.

The logistic regression DIF test

Using the results from the previous section, we are now able to apply the logistic regression DIF test. We do this by the following command, where we also transform group_categ into a categorical variable. The command uses three arguments: resp is a data frame which contains the response matrix (where rows correspond to respondents and columns to items), DIF_covariate is a factor which determines the membership to the focal and reference groups, and theta is a vector of ability parameter estimates for the respondents.

log_reg_DIF <- mstDIF(resp, DIF_covariate = factor(group_categ), method = "logreg",
                theta = theta_est)

This results in an mstDIF-object. Printing the object gives us information about the test and the data.

log_reg_DIF
#>     Differential Item Functioning (DIF) Detection Test 
#>      Method:      DIF-test using Logistic Regression 
#>      Test:        Likelihood Ratio Test 
#>      DIF covariate:   factor(group_categ) 
#>      Data:        resp 
#>      Items:       35 
#>      Persons:     1000

Using the summary-method returns a data frame with item-wise test information. In the logistic regression method, three tests are computed per item. A test to detect uniform DIF, a test to detect non-uniform DIF and a global test that is sensitive to both uniform and non-uniform DIF. By default only the results of the global tests are returned. Using the DIF_type-argument one of more tests can be selected per item. Check ?"mstDIF-Methods" for more information.

For instance, when we want the information form all the tests, we can use:

summary(log_reg_DIF, DIF_type = "all")
#>         overall_stat overall_p_value overall_eff_size uniform_stat
#> Item_8   12.16875795     0.002278179     0.0241827373 1.161337e+01
#> Item_33   9.72315310     0.007738274     0.0237115902 8.855648e+00
#> Item_30   9.53189957     0.008514797     0.0229798361 3.440068e-01
#> Item_32   6.80924287     0.033219393     0.0223673719 6.267057e+00
#> Item_5    6.67415852     0.035540611     0.0067044801 6.452145e+00
#> Item_25   6.37952893     0.041181570     0.0144723882 4.259595e+00
#> Item_7    5.71821721     0.057319832     0.0073594481 1.978559e+00
#> Item_21   5.66930097     0.058739052     0.0160079717 5.200306e+00
#> Item_24   4.99009162     0.082492672     0.0114843336 4.380913e+00
#> Item_3    3.18153017     0.203769651     0.0035582868 3.165964e+00
#> Item_28   2.99557723     0.223624133     0.0069983109 2.809478e+00
#> Item_1    2.83699541     0.242077416     0.0028920528 7.697821e-01
#> Item_16   2.81621943     0.244605220     0.0072479034 1.498498e-01
#> Item_27   1.88748532     0.389168581     0.0041230220 1.642642e+00
#> Item_14   1.77810651     0.411044723     0.0038642187 5.441687e-01
#> Item_23   1.48314314     0.476364686     0.0030051788 4.000434e-01
#> Item_15   1.39785298     0.497118680     0.0034687560 9.509201e-01
#> Item_19   1.37728431     0.502257593     0.0036391944 3.203909e-01
#> Item_22   1.19139830     0.551177077     0.0025834418 9.250174e-01
#> Item_10   1.09360706     0.578796963     0.0024268684 1.073037e+00
#> Item_6    1.04667169     0.592540624     0.0011398236 1.031821e+00
#> Item_13   0.73290590     0.693188750     0.0016357837 9.430860e-02
#> Item_12   0.65517407     0.720660564     0.0011895057 3.424121e-03
#> Item_4    0.62353169     0.732152944     0.0008186510 4.791145e-01
#> Item_17   0.56848892     0.752582647     0.0015896194 4.729859e-01
#> Item_2    0.56122811     0.755319792     0.0005672282 5.556867e-01
#> Item_29   0.55470127     0.757788745     0.0013967037 5.448385e-01
#> Item_9    0.44689166     0.799758214     0.0008250358 2.355315e-01
#> Item_26   0.29721920     0.861905536     0.0005740969 1.966462e-01
#> Item_35   0.29198078     0.864166007     0.0007965819 2.910217e-01
#> Item_11   0.26293458     0.876807955     0.0005536073 2.136855e-01
#> Item_31   0.15422487     0.925785757     0.0004047319 1.487255e-01
#> Item_18   0.12917702     0.937453137     0.0005307723 7.395971e-02
#> Item_20   0.09413638     0.954022340     0.0002471444 9.981562e-04
#> Item_34   0.04655953     0.976989117     0.0001236534 2.952516e-02
#>         uniform_p_value uniform_eff_size non-uniform_stat non-uniform_p_value
#> Item_8     0.0006547933     2.309012e-02      0.555386043         0.456125273
#> Item_33    0.0029218344     2.161678e-02      0.867505551         0.351646804
#> Item_30    0.5575254755     8.378380e-04      9.187892728         0.002436212
#> Item_32    0.0123003302     2.059874e-02      0.542186170         0.461528116
#> Item_5     0.0110818446     6.482177e-03      0.222013133         0.637510275
#> Item_25    0.0390292238     9.681787e-03      2.119933861         0.145392823
#> Item_7     0.1595423285     2.551203e-03      3.739658523         0.053135303
#> Item_21    0.0225829096     1.469181e-02      0.468994760         0.493449990
#> Item_24    0.0363435304     1.008793e-02      0.609178867         0.435097149
#> Item_3     0.0751880488     3.540905e-03      0.015566464         0.900709092
#> Item_28    0.0937088854     6.564652e-03      0.186099614         0.666182897
#> Item_1     0.3802844848     7.855324e-04      2.067213356         0.150496286
#> Item_16    0.6986789051     3.868717e-04      2.666369601         0.102489567
#> Item_27    0.1999634020     3.588984e-03      0.244843623         0.620729475
#> Item_14    0.4607102056     1.183866e-03      1.233937830         0.266642588
#> Item_23    0.5270668604     8.113754e-04      1.083099774         0.298005149
#> Item_15    0.3294852126     2.360941e-03      0.446932911         0.503795076
#> Item_19    0.5713727937     8.476235e-04      1.056893364         0.303924808
#> Item_22    0.3361610042     2.006304e-03      0.266380887         0.605769901
#> Item_10    0.3002604570     2.381263e-03      0.020570025         0.885956482
#> Item_6     0.3097314014     1.123659e-03      0.014850906         0.903006496
#> Item_13    0.7587695051     2.106055e-04      0.638597303         0.424219191
#> Item_12    0.9533376271     6.220204e-06      0.651749953         0.419487720
#> Item_4     0.4888236954     6.290873e-04      0.144417214         0.703928602
#> Item_17    0.4916167151     1.322721e-03      0.095503024         0.757294420
#> Item_2     0.4560033932     5.616290e-04      0.005541451         0.940659546
#> Item_29    0.4604343634     1.371885e-03      0.009862717         0.920891175
#> Item_9     0.6274522746     4.349099e-04      0.211360132         0.645703429
#> Item_26    0.6574416815     3.798687e-04      0.100573023         0.751143065
#> Item_35    0.5895665701     7.939661e-04      0.000959122         0.975293708
#> Item_11    0.6438939599     4.499329e-04      0.049249086         0.824375110
#> Item_31    0.6997563210     3.903023e-04      0.005499388         0.940884776
#> Item_18    0.7856563491     3.039111e-04      0.055217304         0.814221442
#> Item_20    0.9747961394     2.620834e-06      0.093138224         0.760224832
#> Item_34    0.8635720757     7.841480e-05      0.017034374         0.896158354
#>         non-uniform_eff_size    N
#> Item_8          1.092617e-03  576
#> Item_33         2.094808e-03  450
#> Item_30         2.214200e-02  450
#> Item_32         1.768627e-03  450
#> Item_5          2.223030e-04 1000
#> Item_25         4.790601e-03  550
#> Item_7          4.808245e-03 1000
#> Item_21         1.316159e-03  424
#> Item_24         1.396404e-03  550
#> Item_3          1.738230e-05 1000
#> Item_28         4.336591e-04  550
#> Item_1          2.106520e-03 1000
#> Item_16         6.861032e-03  424
#> Item_27         5.340379e-04  550
#> Item_14         2.680353e-03  576
#> Item_23         2.193803e-03  550
#> Item_15         1.107815e-03  424
#> Item_19         2.791571e-03  424
#> Item_22         5.771377e-04  550
#> Item_10         4.560528e-05  576
#> Item_6          1.616427e-05 1000
#> Item_13         1.425178e-03  576
#> Item_12         1.183285e-03  576
#> Item_4          1.895637e-04 1000
#> Item_17         2.668984e-04  424
#> Item_2          5.599138e-06 1000
#> Item_29         2.481869e-05  450
#> Item_9          3.901259e-04  576
#> Item_26         1.942282e-04  550
#> Item_35         2.615830e-06  450
#> Item_11         1.036744e-04  576
#> Item_31         1.442965e-05  450
#> Item_18         2.268612e-04  424
#> Item_20         2.445235e-04  424
#> Item_34         4.523864e-05  450

This output can be read as follows: Each rows corresponds to an item, and each column to information on this item. Items with a lower p-value are presented first. Focusing on the global DIF tests, the following information is given:

overall_stat the test statistic
overall_p_value the \(p\)-value
overall_eff_size the effect size (Nagelkerke’s R squared)
N The number of respondents answering this item.

Note that most DIF tests only contain a global test per item, and effect sizes are only available for the logistic regression method in the current version of mstDIF.

By inspecting the p-values in the second column, we see that there is an indication for an overall DIF effect in three items, which are labeled as Item_8, Item_33 and Item_30. In these three items, the p-values are below 0.05. However, the effect sizes are very small. An inspection of the columns uniform_p_value and non-uniform_p_value would indicate that the DIF effect of items 8 and 33 is overall uniform, while it is rather non-uniform for item 30. However, given the large size of the item set, these effects could also be random fluctuations in the sample and therefore false positive. We could either a) correct for multiple testing or b) form hypotheses which items we would like to test for DIF.

The mstSIB test

We carry out the second DIF test, which is the mstSIB procedure. The respective command requires four arguments. The first argument is the response matrix resp, the second argument DIF_covariate is a factor that indicates the membership to the focal and reference group, and the final two arguments are theta and see. Whereas theta contains estimates of the ability parameters, see contains the standard errors of the ability parameters. We run the second DIF test by running:

mstSIB_DIF <- mstDIF(resp, DIF_covariate = factor(group_categ), method = "mstsib",
                theta = theta_est, see = see_est)
mstSIB_DIF
#>     Differential Item Functioning (DIF) Detection Test 
#>      Method:      SIB test for DIF in MST 
#>      Test:        SIB-test 
#>      DIF covariate:   factor(group_categ) 
#>      Data:        resp 
#>      Items:       35 
#>      Persons:     1000

As in the first test, printing the test given detailed information on the test and the underlying data set. By applying summary, we get the individual p-values:

summary(mstSIB_DIF)
#>                 stat      p_value    N
#> Item_32  0.053326967 2.024588e-36  450
#> Item_25 -0.104163154 1.151095e-14  550
#> Item_5  -0.068129925 3.448207e-07 1000
#> Item_33 -0.122530904 3.505046e-05  450
#> Item_21 -0.097249381 1.557694e-04  424
#> Item_7   0.042461528 1.786444e-04 1000
#> Item_8   0.129955506 1.461819e-02  576
#> Item_14 -0.046968712 2.163451e-02  576
#> Item_11 -0.016487268 3.537402e-02  576
#> Item_4  -0.004426816 9.884723e-02 1000
#> Item_3   0.047124831 1.076104e-01 1000
#> Item_2  -0.021864399 1.478652e-01 1000
#> Item_28  0.055640837 1.706851e-01  550
#> Item_19  0.028802851 1.750301e-01  424
#> Item_18 -0.002701141 1.930745e-01  424
#> Item_6   0.033373013 2.163196e-01 1000
#> Item_15  0.061048096 3.200998e-01  424
#> Item_30  0.041791752 3.604481e-01  450
#> Item_1   0.021816674 3.815204e-01 1000
#> Item_26 -0.041927073 4.175809e-01  550
#> Item_24 -0.072836077 4.431133e-01  550
#> Item_35 -0.028356221 4.709391e-01  450
#> Item_16 -0.018072989 4.999245e-01  424
#> Item_9  -0.017760091 6.030239e-01  576
#> Item_12 -0.015659338 6.079199e-01  576
#> Item_10 -0.025964475 6.327634e-01  576
#> Item_27  0.042820257 6.487289e-01  550
#> Item_31  0.034512576 6.993907e-01  450
#> Item_22  0.034949483 7.053106e-01  550
#> Item_29 -0.019122147 8.394662e-01  450
#> Item_13  0.010666372 8.423829e-01  576
#> Item_20  0.006413806 8.977554e-01  424
#> Item_17 -0.006816036 9.120287e-01  424
#> Item_23  0.006519237 9.143417e-01  550
#> Item_34  0.003871662 9.626978e-01  450

We see that the p-values of 9 items (5, 7, 8, 11, 14, 21, 25, 32 and 33) are below 0.05, indicating a DIF effect for these items. N again indicates the number of respondents responding to the respective item. As can be seen, the DIF tests of mstSIB and logistic regression do not always agree in their results. We move on to the third DIF test, which is a score-based DIF test.

The Score-Based DIF test

The third test is an analytical score-based DIF. This test uses the mstDIF command and can be applied to dRm objects which are generated by the RM command of eRm as well as SingleGroupObjects and MultiGroupObjects that can be generated with the mirt package. In its simplest version, it requires three arguments. The first argument is object, which is the object obtained from eRm or mirt. The second is DIF_covariate, which is again used as a person covariate that is used to test for DIF. In contrast to the logistic regression test and mstSIB, this argument can also be a metric variable. Finally, setting the third argument, method, to “analytical”, determines that an analytical test is used. To apply this test, we first estimate a 2PL model with the mirt package:

library(mirt)
#> Lade nötiges Paket: stats4
#> Lade nötiges Paket: lattice

mirt_model <- mirt(as.data.frame(resp), model = 1, verbose = FALSE)

We now apply the analytical score-based DIF test:

sc_DIF <- mstDIF(mirt_model, DIF_covariate = factor(group_categ), method = "analytical")
sc_DIF
#>     Differential Item Functioning (DIF) Detection Test 
#>      Method:      Asymptotic score-based DIF test 
#>      Test:        Lagrange Multiplier Test for Unordered Groups 
#>      DIF covariate:   factor(group_categ) 
#>      Data:        NULL 
#>      Items:       35 
#>      Persons:     1000

As with the other tests, printing the object returns information on the test and the underlying dataset. Since we applied the test to a mirt object, the Data are given as NULL. The test statistic depends on the type of covariate that is used in the DIF test. In the case of a discrete, unordered person covariate, the used test statistic leads to a Lagrange Multiplier test for unordered groups. As with the other tests, we get p-values via the summary command:

summary(sc_DIF)
#>                stat     p_value    N
#> Item_33 9.906090082 0.007061872  450
#> Item_8  9.896468123 0.007095929  576
#> Item_30 9.032859053 0.010927972  450
#> Item_32 6.992653625 0.030308508  450
#> Item_5  6.275120280 0.043388531 1000
#> Item_7  5.965298885 0.050658439 1000
#> Item_21 5.847563724 0.053730103  424
#> Item_24 4.909636758 0.085878791  550
#> Item_16 4.514169336 0.104655145  424
#> Item_25 4.003240644 0.135116174  550
#> Item_28 3.054264732 0.217157504  550
#> Item_1  2.858982161 0.239430742 1000
#> Item_3  2.506000875 0.285646446 1000
#> Item_27 2.487716089 0.288269912  550
#> Item_14 2.075146170 0.354313528  576
#> Item_23 1.964683191 0.374433300  550
#> Item_15 1.091405215 0.579434525  424
#> Item_17 1.056119576 0.589748097  424
#> Item_19 0.897735949 0.638350372  424
#> Item_6  0.897261524 0.638501814 1000
#> Item_4  0.854106018 0.652428967 1000
#> Item_10 0.822730948 0.662744671  576
#> Item_13 0.806095883 0.668280060  576
#> Item_9  0.743360294 0.689574770  576
#> Item_35 0.741449239 0.690233992  450
#> Item_2  0.674499242 0.713730655 1000
#> Item_18 0.600214048 0.740738940  424
#> Item_29 0.582602393 0.747290563  450
#> Item_22 0.571962850 0.751276571  550
#> Item_12 0.513515239 0.773555686  576
#> Item_31 0.414089485 0.812983274  450
#> Item_34 0.337143285 0.844870733  450
#> Item_20 0.283572933 0.867806542  424
#> Item_26 0.255243514 0.880186240  550
#> Item_11 0.009199239 0.995410943  576

Similar to the logistic regression test, we obtain p-values below 0.05 for the five items 5, 8, 30, 32 and 33. To prevent an increased rate of false positive results, we could again a) correct for multiple testing or b) define hypotheses which items we want to test for DIF before we carry out the tests. From a technical perspective, these analytical DIF tests assume that all other items are DIF free. It is possible to explicitly define a set of anchor item to weaken this assumption, but this goes beyond the scope of this vignette.

In contrast to the logistic regression and mstSIB DIF test, score-based tests also allow to test continuous and ordinal person covariates for DIF effects. We will demonstrate this feature with the group_cont covariate:

sc_DIF_2 <- mstDIF(mirt_model, DIF_covariate = group_cont, method = "analytical")
sc_DIF_2
#>     Differential Item Functioning (DIF) Detection Test 
#>      Method:      Asymptotic score-based DIF test 
#>      Test:        Double Maximum Test 
#>      DIF covariate:   group_cont 
#>      Data:        NULL 
#>      Items:       35 
#>      Persons:     1000

As usual, we can investigate the results for the individual items with:

summary(sc_DIF_2)
#>              stat    p_value    N
#> Item_5  1.3659237 0.09353094 1000
#> Item_2  1.3287680 0.11364405 1000
#> Item_12 1.2791132 0.14592279  576
#> Item_18 1.2504890 0.16762084  424
#> Item_7  1.1718791 0.24006446 1000
#> Item_32 1.1674586 0.24472636  450
#> Item_21 1.0864634 0.34151641  424
#> Item_35 1.0344161 0.41465048  450
#> Item_8  1.0124496 0.44779182  576
#> Item_23 1.0101888 0.45127109  550
#> Item_31 1.0066371 0.45676157  450
#> Item_3  0.9674889 0.51904522 1000
#> Item_9  0.9667843 0.52019204  576
#> Item_11 0.9454645 0.55522588  576
#> Item_10 0.9383952 0.56696119  576
#> Item_26 0.9248651 0.58953776  550
#> Item_14 0.9213109 0.59548694  576
#> Item_27 0.9179728 0.60107926  550
#> Item_24 0.9025400 0.62696387  550
#> Item_17 0.8997145 0.63170271  424
#> Item_20 0.8818305 0.66162553  424
#> Item_1  0.8493214 0.71521842 1000
#> Item_30 0.8349971 0.73825373  450
#> Item_29 0.8286536 0.74830002  450
#> Item_33 0.8172372 0.76610118  450
#> Item_28 0.7851220 0.81383178  550
#> Item_34 0.7831228 0.81667025  450
#> Item_16 0.7775574 0.82448038  424
#> Item_19 0.7612028 0.84660497  424
#> Item_15 0.7529646 0.85725090  424
#> Item_6  0.7415213 0.87144470 1000
#> Item_13 0.7336175 0.88082673  576
#> Item_25 0.7150150 0.90147702  550
#> Item_22 0.7047585 0.91196892  550
#> Item_4  0.6031870 0.98040977 1000

As can be seen, there are no significant DIF effects.

Permutation and Bootstrap DIF tests

Finally, we apply permutation and bootstrap DIF tests. In contrast to the other DIF tests presented in this vignette, these tests make use of the item parameters used during the presentation of the adaptive tests. Technically, these tests aim at testing the hypothesis that the true item parameters are invariant and correspond to the values used in the presentation of the adaptive test. These item parameters are stored in the it matrix. We start our application of these tests by explicitly storing the discrimination and difficulty parameters in separate vectors:

discr <- it[,1]
diff <- it[,2]

We can now apply the bootstrap DIF test by the following command:

bootstrap_DIF <- mstDIF(resp = resp, DIF_covariate = group_categ, method = "bootstrap",
                a = discr, b = diff, decorrelate = F)
#> Estimating:  4pl model ... 
#> type = wle 
#> Estimation finished!

After starting this command, the person parameters are calculated again using the PP package. We get notified that the estimation was finished. Printing the resulting object again gives details on the underlying data and test:

bootstrap_DIF
#>     Differential Item Functioning (DIF) Detection Test 
#>      Method:      Bootstrap score-based DIF test with 1000 samples 
#>      Test:        Double Maximum Test 
#>      DIF covariate:   group_categ 
#>      Data:        resp 
#>      Items:       35 
#>      Persons:     1000

Using the summary command, we get the p-values:

summary(bootstrap_DIF)
#>             stat p_value    N
#> Item_8  41.43535   0.006  576
#> Item_33 35.32980   0.013  450
#> Item_32 30.50668   0.036  450
#> Item_30 31.35700   0.052  450
#> Item_21 28.94987   0.058  424
#> Item_5  43.40601   0.082 1000
#> Item_16 27.69800   0.088  424
#> Item_25 28.84286   0.116  550
#> Item_24 28.25936   0.155  550
#> Item_22 27.19333   0.176  550
#> Item_7  37.26782   0.187 1000
#> Item_9  27.92269   0.218  576
#> Item_29 23.71674   0.223  450
#> Item_20 22.45761   0.254  424
#> Item_26 25.50687   0.320  550
#> Item_1  31.06844   0.446 1000
#> Item_28 22.58680   0.459  550
#> Item_23 22.57854   0.473  550
#> Item_11 21.29707   0.483  576
#> Item_14 23.01867   0.497  576
#> Item_15 18.65121   0.544  424
#> Item_3  28.14860   0.573 1000
#> Item_27 19.50961   0.609  550
#> Item_19 16.99457   0.683  424
#> Item_6  25.41009   0.714 1000
#> Item_31 16.70285   0.715  450
#> Item_35 16.22799   0.768  450
#> Item_10 16.43764   0.835  576
#> Item_18 13.13709   0.878  424
#> Item_13 15.50530   0.887  576
#> Item_12 16.06358   0.924  576
#> Item_17 13.01244   0.936  424
#> Item_2  20.48589   0.946 1000
#> Item_34 12.05401   0.970  450
#> Item_4  15.74184   0.992 1000

We see that items 8, 30, 32 and 33 show p-values below 0.0, similar to the analytical score-based test. As with the other tests, we could either correct for multiple testing or define hypotheses beforehand to prevent an increased rate of false positive results. As was the case with the analytical score-based tests, we can also test continuous and ordinal person covariates for DIF. We demonstrate this type of analysis with the group_cont covariate:

bootstrap_DIF_2 <- mstDIF(resp = resp, DIF_covariate = group_cont, method = "bootstrap",
                a = discr, b = diff, decorrelate = F)
#> Estimating:  4pl model ... 
#> type = wle 
#> Estimation finished!

bootstrap_DIF_2
#>     Differential Item Functioning (DIF) Detection Test 
#>      Method:      Bootstrap score-based DIF test with 1000 samples 
#>      Test:        Double Maximum Test 
#>      DIF covariate:   group_cont 
#>      Data:        resp 
#>      Items:       35 
#>      Persons:     1000

The results of this analysis are:

summary(bootstrap_DIF_2)
#>             stat p_value    N
#> Item_18 34.00477   0.007  424
#> Item_5  47.12704   0.046 1000
#> Item_2  44.09327   0.053 1000
#> Item_35 28.95484   0.074  450
#> Item_32 25.54950   0.117  450
#> Item_7  39.03102   0.155 1000
#> Item_23 25.68833   0.314  550
#> Item_12 26.29353   0.319  576
#> Item_20 21.53959   0.322  424
#> Item_16 21.58565   0.340  424
#> Item_26 24.04213   0.405  550
#> Item_10 23.00194   0.409  576
#> Item_1  31.87110   0.417 1000
#> Item_27 22.10265   0.433  550
#> Item_3  29.74656   0.493 1000
#> Item_9  22.30023   0.523  576
#> Item_30 19.56794   0.548  450
#> Item_31 18.43406   0.586  450
#> Item_14 21.12367   0.601  576
#> Item_29 18.01397   0.603  450
#> Item_21 17.53189   0.646  424
#> Item_24 18.52007   0.672  550
#> Item_33 17.51145   0.674  450
#> Item_11 18.35649   0.735  576
#> Item_17 15.68125   0.742  424
#> Item_34 16.06359   0.762  450
#> Item_19 15.93857   0.765  424
#> Item_8  18.00935   0.776  576
#> Item_13 17.69414   0.783  576
#> Item_25 16.05803   0.797  550
#> Item_6  24.09979   0.809 1000
#> Item_15 15.36422   0.817  424
#> Item_28 17.04354   0.850  550
#> Item_22 16.08331   0.856  550
#> Item_4  16.80331   0.983 1000

We find significant DIF effects for items 5 and 18.

The permutation based DIF test works analogously. We therefore just demonstrate the commands and their output:

permutation_DIF <- mstDIF(resp = resp, DIF_covariate = group_categ, method = "permutation",
                a = discr, b = diff, decorrelate = F)
#> Estimating:  4pl model ... 
#> type = wle 
#> Estimation finished!

permutation_DIF_2 <- mstDIF(resp = resp, DIF_covariate = group_cont, method = "permutation",
                a = discr, b = diff, decorrelate = F)
#> Estimating:  4pl model ... 
#> type = wle 
#> Estimation finished!

The results for the categorical covariate are:

summary(permutation_DIF)
#>             stat p_value    N
#> Item_8  41.43535   0.005  576
#> Item_33 35.32980   0.019  450
#> Item_30 31.35700   0.040  450
#> Item_32 30.50668   0.041  450
#> Item_21 28.94987   0.059  424
#> Item_5  43.40601   0.077 1000
#> Item_16 27.69800   0.089  424
#> Item_25 28.84286   0.107  550
#> Item_24 28.25936   0.147  550
#> Item_22 27.19333   0.195  550
#> Item_9  27.92269   0.215  576
#> Item_7  37.26782   0.216 1000
#> Item_29 23.71674   0.234  450
#> Item_20 22.45761   0.253  424
#> Item_26 25.50687   0.307  550
#> Item_1  31.06844   0.436 1000
#> Item_23 22.57854   0.468  550
#> Item_14 23.01867   0.500  576
#> Item_28 22.58680   0.502  550
#> Item_11 21.29707   0.503  576
#> Item_15 18.65121   0.522  424
#> Item_27 19.50961   0.603  550
#> Item_3  28.14860   0.610 1000
#> Item_31 16.70285   0.710  450
#> Item_19 16.99457   0.714  424
#> Item_6  25.41009   0.746 1000
#> Item_35 16.22799   0.778  450
#> Item_10 16.43764   0.854  576
#> Item_18 13.13709   0.874  424
#> Item_13 15.50530   0.913  576
#> Item_12 16.06358   0.920  576
#> Item_17 13.01244   0.927  424
#> Item_2  20.48589   0.938 1000
#> Item_34 12.05401   0.966  450
#> Item_4  15.74184   0.994 1000

The results for the continuous covariate are:

summary(permutation_DIF_2)
#>             stat p_value    N
#> Item_18 34.00477   0.016  424
#> Item_5  47.12704   0.038 1000
#> Item_2  44.09327   0.056 1000
#> Item_35 28.95484   0.071  450
#> Item_32 25.54950   0.141  450
#> Item_7  39.03102   0.151 1000
#> Item_12 26.29353   0.294  576
#> Item_20 21.53959   0.295  424
#> Item_23 25.68833   0.309  550
#> Item_16 21.58565   0.350  424
#> Item_26 24.04213   0.371  550
#> Item_10 23.00194   0.403  576
#> Item_1  31.87110   0.404 1000
#> Item_27 22.10265   0.470  550
#> Item_3  29.74656   0.500 1000
#> Item_9  22.30023   0.520  576
#> Item_30 19.56794   0.527  450
#> Item_31 18.43406   0.596  450
#> Item_14 21.12367   0.621  576
#> Item_29 18.01397   0.621  450
#> Item_21 17.53189   0.643  424
#> Item_11 18.35649   0.682  576
#> Item_24 18.52007   0.717  550
#> Item_33 17.51145   0.722  450
#> Item_17 15.68125   0.736  424
#> Item_19 15.93857   0.762  424
#> Item_34 16.06359   0.767  450
#> Item_6  24.09979   0.787 1000
#> Item_8  18.00935   0.788  576
#> Item_13 17.69414   0.789  576
#> Item_15 15.36422   0.802  424
#> Item_25 16.05803   0.816  550
#> Item_22 16.08331   0.847  550
#> Item_28 17.04354   0.859  550
#> Item_4  16.80331   0.986 1000

The results are very similar to those of the bootstrap DIF test.

Conclusion

In this vignette, we illustrated the use of the various tests included in the mstDIF package. The available tests include logistic regression, the mstSIB test, analytical score-based tests, bootstrap score-based tests and permutation score-based tests. For the three types of score-based tests, we further demonstrated their application to test a continuous covariate for DIF.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.