The Effectiveness of Mantel Haenszel Log Odds Ratio Method in Detecting Differential Item Functioning Across Different Sample Sizes and Test Lengths Using Real Data Analysis
DOI:
https://doi.org/10.35516/edu.v51i3.6755Keywords:
Mantel Haenszel, Log Odds Ratio, DIF, Real Data, PISA test, Tenth-gradeAbstract
Objectives: This study aims to determine the effectiveness of the Mantel Haenszel Log Odds Ratio method in detecting Differential Item Functioning (DIF) across gender, while considering variations in sample size and test length. Utilizing real data, the study draws from a sample of tenth-grade students in Jordan who participated in the 2018 PISA International Mathematics Test.
Methods: The study employs the experimental methodology, utilizing three levels of sample size and test length: (342, 200, and 100) and (30, 20, and 10), respectively. Nine iterations of the DDFS program were conducted to collect the results, representing nine scenarios resulting from the intersection of sample size and test length levels.
Results: The study indicates that variations in sample size and test length significantly affect the Mantel-Hanzel (MH) method. Specifically, it observes an improvement in the MH method’s ability to detect DIF items with larger sample sizes, while maintaining a consistent test length. Conversely, the method’s efficacy declines with longer test lengths, despite maintaining a fixed sample size at a specific level.
Conclusion: The study recommends using a large sample size and a short test length for effective detection of DIF items using the MH method.
Downloads
References
Ackerman, T. A. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29(1), 67-91. https://doi.org/10.1111/j.1745-3984.1992.tb00368.x
Alomari, H., Akour, M. M., & Al Ajlouni, J. (2023). The effect of Sample Size on Differential Item Functioning and Differential Distractor Functioning in multiple-choice items. Psychology Hub, 40(2), 17–24. https://doi.org/10.13133/2724-2943/17992
Arıkan, Ç., Uğurlu, S., & Atar, B. (2016). A DIF and bias study by using MIMIC, SIBTEST, Logistic Regression, and Mantel-Haenszel methods. Hacettepe University Journal of Education, 31(1), 34-52. DOI:10.16986/HUJE.2015014226
Camilli, G., Shepard, L. A., & Shepard, L. (1994). Methods for identifying biased test items (Vol. 4). SAGE: university of Michigan.
Dorans, N. J., & Holland, P. W. (1992). DIF detection and description: Mantel‐Haenszel and standardization 1, 2. ETS Research Report Series, 1992(1), i-40. https://doi.org/10.1002/j.2333-8504.1992.tb01440.x
Eom, M. (2008). Underlying factors of MELAB listening construct. Spaan Fellow Working Papers in Second or Foreign Language Assessment, 6, 77–94.
Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio. Applied psychological measurement, 29(4), 278-295. DOI:10.1177/0146621605275728
GAO, X. (2019). A comparison of six DIF detection methods. Unpublished Master Theses, University of Connecticut Graduate School, https://digitalcommons.lib.uconn.edu/gs_theses/1411
GU, K. (2023). Washback Effects of IELTS Test on Teachers' Adoption of Teaching Materials in the Classroom in China. International Journal on Social & Education Sciences (IJonSES), 5(2). DOI: https://doi.org/10.46328/ijonses.513
Holland, P. W., & Thayer, D. T. (1986). Differential item functioning and the Mantel‐Haenszel procedure. ETS Research Report Series, 1986(2), i-24. DOI: https://doi.org/10.1002/j.2330-8516.1986.tb00186.x
Ihlenfeldt, S. D., & Rios, J. A. (2023). A meta-analysis on the predictive validity of English language proficiency assessments for college admissions. Language Testing, 40(2), 276-299. DOI:10.1177/02655322221112364
Kabasakala, K., Arsan, N., Gok, B., & Kelecooglu, H. (2014). Comparing Performances (Type I error and Power) of IRT Likelihood Ratio SIBTEST and Mantel-Haenszel Methods in the Determination of Differential Item Functioning. Educational Sciences: Theory & Practice, 14(6), 2186-2193. DOI: 10.12738/estp.2014.6.2165
Mantel, N., & Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22(4), 719-748. https://doi.org/10.1093/jnci/22.4.719
Marôco, J. (2021). Portugal: The PISA Effects on Education. In: Crato, N. (eds) Improving a Country’s Education. Springer, Cham. https://doi.org/10.1007/978-3-030-59031-4_8
Mellenbergh, G. J. (1989). Item bias and item response theory. International journal of educational research, 13(2), 127-143.
Millsap, R. E., & Everson, H. T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied psychological measurement, 17(4), 297-334. https://doi.org/10.1177/014662169301700401
Münch, R., & Wieczorek, O. (2023). Improving schooling through effective governance? The United States, Canada, South Korea, and Singapore are in the struggle for PISA scores. Comparative Education, 59(1), 59-76. DOI:10.1080/03050068.2022.2138176
Narayanon, P., & Swaminathan, H. (1996). Identification of items that show nonuniform DIF. Applied psychological measurement, 20(3), 257-274. https://doi.org/10.1177/014662169602000306
Park, G. (2008). Differential Item Functioning on an English Listening Test across Gender. TESOL Quarterly, 42(1), 115-123.
Penfield, R. D. (2010). DDFS: Differential distractor functioning software. Applied psychological measurement, 34(8), 646-647. https://doi.org/10.1177/0146621610375690
Penfield, R. D., & Camilli, G. (2006). Five Differential Item Functioning and Item Bias. Handbook of statistics, 26, 125-167. DOI:10.1016/S0169-7161(06)26005-X
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of educational measurement, 27(4), 361-370. https://doi.org/10.1111/j.1745-3984.1990.tb00754.x
Taylor, C. S., & Lee, Y. (2012). Gender DIF in reading and mathematics tests with mixed item formats. Applied Measurement in Education, 25(3), 246-280. https://doi.org/10.1080/08957347.2012.687650.
the MELAB Listening Test. Language Assessment Quarterly, 8, 361–385. DOI:10.1080/15434303.2011.628632
Vahid A., Christine C. & Lee O. (2011). An Investigation of Differential Item Functioning in
Wagner, A. (2004). A construct validation study of the extended listening sections of the ECRE and MELAB. Spaan Fellow Working Papers in Second or Foreign Language Assessment, 2, 1–23.
Wall, D., & Horák, T. (2006). The impact of changes in the TOEFL examination on teaching and learning in Central and Eastern Europe: Phase 1, the baseline study. ETS Research Report Series, 2006(1), i-199. https://doi.org/10.1002/j.2333-8504.2006.tb02024.x
Wall, D., & Horák, T. (2008). The impact of changes in the TOEFL examination on teaching and learning in Central and Eastern Europe: Phase 2, coping with change. ETS Research Report Series, 2008(2), i-105. https://doi.org/10.1002/j.2333-8504.2008.tb02123.x
Williams, S. (1997). The unbiased anchor bridging the gap between DIF and item bias. Applied Measurement and Education, 10(3), 253-267. https://doi.org/10.1207/s15324818ame1003_4
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Dirasat: Educational Sciences

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Accepted 2024-05-30
Published 2024-09-15
