The Implication of Wald Test Anchor-All-Test-All Procedure for Anchor Selection

Authors

  • Yahia Alsmadi School of Educational Sciences, The University of Jordan, Jordan.

Keywords:

Wald X2 Test, DIF testing, anchor items, anchor-all-test-all, test validity

Abstract

This study aims to provide information on whether the more practical and time saver anchor-all-test-all procedure (AATA) strategy for anchor selection can demonstrate similar performance or outperform the well known all-others-as-anchors (AOAA) procedure under certain conditions.

FlexMIRT-3 and IRTLRDIF-2 was utilized for Wald’s x 2 AATA and IRT-LR AOAA. All the parameters were constrained between the groups equally for the estimation of focal group distribution. These parameters were estimated through a full model that confined the focal group mean and SD values from the baseline model. The X2 statistics have been used to evaluate the differences between two sets of item parameters of two different groups.

DIF tests results were dependent on the anchor selection strategies employed. The results have revealed that appropriate anchor-selection not only depends on the sample size, but it has an association with the proportion of DIF items and the direction of DIF along with the length of the anchor. Test depicts the efficacy of the Wald method for the DIF.

This study suggests future researchers compare the different modeling approaches for the detection of DIF methods such as multiple indicators multiple causes (MIMIC) modeling following variant simulation techniques. In addition, future studies are recommended to analyze the efficiency of the anchor selection strategies concerning different groups and times

Downloads

Download data is not yet available.

References

Ankenmann, R. D., Witt, E. A., & Dunbar, S. B. (1999). An Investigation of the Power of the Likelihood Ratio Goodness-of-Fit Statistic in Detecting Differential Item Functioning. Journal of Educational Measurement, 36(4), 277-300. https://doi.org/10.1111/j.1745-3984.1999.tb00558.x.

Bastug, Ö. Y. Ö. (2016). A Comparison of Four Differential Item Functioning Procedures in the Presence of Multidimensionality. Educational Research and Reviews, 11(13), 1251-1261.

Battauz, M. (2019). On Wald tests for differential item functioning detection. Statistical Methods & Applications, 28(1), 103-118. https://doi.org/10.1007/s10260-018-00442-w.

Cao, M., Tay, L., & Liu, Y. (2017). A Monte Carlo study of an iterative Wald test procedure for DIF analysis. Educational and psychological measurement, 77(1), 104-118. https://doi.org/10.1177/0013164416637104.

Childs, R. A., Dahlstrom, W. G., & Panter, A. T. (2000). Item response theory in personality assessment: A demonstration using the MMPI-2 Depression Scale. Assessment, 7(1), 37-54. https://doi.org/10.1177/107319110000700103.

Chun, S., Stark, S., Kim, E. S., & Chernyshenko, O. S. (2016). MIMIC methods for detecting DIF among multiple groups: exploring a new sequential-free baseline procedure. Applied psychological measurement, 40(7), 486-499. https://doi.org/10.1177/0146621616659738.

Hou, L., la Torre, J. D., & Nandakumar, R. (2014). Differential item functioning assessment in cognitive diagnostic modeling: Application of the Wald test to investigate DIF in the DINA model. Journal of Educational Measurement, 51(1), 98-125. https://doi.org/10.1111/jedm.12036.

Kim, E. S., Yoon, M., & Lee, T. (2012). Testing measurement invariance using MIMIC: Likelihood ratio test with a critical value adjustment. Educational and Psychological Measurement, 72(3), 469-492.

Kim, S. H., Cohen, A. S., & Kim, H. O. (1994). An investigation of Lord’s procedure for the detection of differential item functioning. Applied Psychological Measurement, 18(3), 217-228. https://doi.org/10.1177/014662169401800303.

Kirk, R. E. (1996). Practical significance: A concept whose time has come. Educational and psychological measurement, 56(5), 746-759. https://doi.org/10.1177/0013164496056005002.

Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking. New York: Springer. https://doi.org/10.1007/978-1-4939-0317-7.

Kopf, J., Zeileis, A., & Strobl, C. (2015). A framework for anchor methods and an iterative forward approach for DIF detection. Applied Psychological Measurement, 39(2), 83-103. https://doi.org/10.1177/0146621614544195.

Langer, M. M. (2008). A reexamination of Lord's Wald test for differential item functioning using item response theory and modern error estimation (Doctoral dissertation, The University of North Carolina at Chapel Hill).

Lopez Rivas, G. E., Stark, S., & Chernyshenko, O. S. (2009). The effects of referent item parameters on differential item functioning detection using the free baseline likelihood ratio test. Applied Psychological Measurement, 33(4), 251-265.

Lord, F. M. (1968). An analysis of the Verbal Scholastic Aptitude Test using Birnbaum’s three-parameter logistic model. Educational and Psychological Measurement, 28(4), 989-1020. https://doi.org/10.1177/001316446802800401.

Magis, D., Béland, S., Tuerlinckx, F., & De Boeck, P. (2010). A general framework and an R package for the detection of dichotomous differential item functioning. Behavior research methods, 42(3), 847-862. https://doi.org/10.3758/brm.42.3.847.

Marsman, M., Waldorp, L., & Maris, G. (2017). A note on large-scale logistic prediction: Using an approximate graphical model to deal with collinearity and missing data. Behaviormetrika, 44(2), 513-534. https://doi.org/10.1007/s41237-017-0024-x.

Maydeu-Olivares, A., & Cai, L. (2006). A cautionary note on using G2 (dif) to assess relative model fit in categorical data analysis. Multivariate Behavioral Research, 41(1), 55-64. https://doi.org/10.1207/s15327906mbr4101_4.

Meade, A. W., & Wright, N. A. (2012). Solving the measurement invariance anchor item problem in item response theory. Journal of Applied Psychology, 97(5), 1016. https://doi.org/10.1037/a0027934.

Thissen, D. (2001). IRTLRDIF v. 2.0 b: Software for the computation of the statistics involved in item response theory likelihood-ratio tests for differential item functioning. Chapel Hill, NC: LL Thurstone Psychometric Laboratory.

Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models.

Wang, M., & Woods, C. M. (2017). Anchor Selection Using the Wald Test Anchor-All-Test-All Procedure. Applied Psychological Measurement, 41(1), 17-29. https://doi.org/10.1177/0146621616668014.

Wang, W. C., & Yeh, Y. L. (2003). Effects of anchor item methods on differential item functioning detection with the likelihood ratio test. Applied Psychological Measurement, 27(6), 479-498. https://doi.org/10.1177/0146621603259902.

Wang, W. C., Shih, C. L., & Yang, C. C. (2009). The MIMIC method with scale purification for detecting differential item functioning. Educational and Psychological Measurement, 69(5), 713-731. https://doi.org/10.1177/0013164409332228.

Woods, C. M. (2009). Empirical selection of anchors for tests of differential item functioning. Applied Psychological Measurement, 33(1), 42-57. https://doi.org/10.1177/0146621607314044.

Woods, C. M., Cai, L., & Wang, M. (2013). The Langer-improved Wald test for DIF testing with multiple groups: Evaluation and comparison to two-group IRT. Educational and Psychological Measurement, 73(3), 532-547. https://doi.org/10.1177/0013164412464875.

Zumbo, B. D. (2007). Three generations of DIF analyses: Considering where it has been, where it is now, and where it is going. Language assessment quarterly, 4(2), 223-233. https://doi.org/10.1080/15434300701375832.

Downloads

Published

2021-09-01

How to Cite

Alsmadi , Y. . (2021). The Implication of Wald Test Anchor-All-Test-All Procedure for Anchor Selection. Dirasat: Educational Sciences, 48(3), 428–436. Retrieved from http://dsr.ju.edu.jo/djournals/index.php/Edu/article/view/2885

Issue

Section

Articles