Predictive Accuracy of Models for Evaluating Student Performance in PISA Mathematics Test in Jordan Using Explanatory Item Response Models and Machine Learning Models: A Comparative Study
DOI:
https://doi.org/10.35516/Edu.2025.10870Keywords:
Explanatory Item Response Models, Machine Learning Models, Predictive Accuracy, PISA test.Abstract
Objectives: The study aimed to compare the predictive accuracy of student performance assessment models on the PISA 2022 mathematics test in Jordan using explanatory item response models (EIRMs) and machine learning models: Random Forest, Artificial Neural Networks, Naïve Bayes, Support Vector Machine, and K-Nearest Neighbor.
Methods: A descriptive analysis method was used, based on data from 7,799 Jordanian students randomly selected from 260 schools that participated in the test. Ten-fold cross-validation was employed to compare model predictive accuracy. Predictive variables included item difficulty, and student-related factors: gender, supervisory authority, socioeconomic status, bullying, use of digital applications outside school, availability of internet-connected devices in schools, teachers’ digital skills, and use of digital resources in math classes.
Results: The Naïve Bayes model achieved the highest predictive accuracy (0.718), while the EIRM showed strong discriminatory power with an AUC of 0.693, outperforming machine learning models in distinguishing between student responses. Item difficulty emerged as the most influential predictor.
Conclusions: The study recommends further research incorporating new variables and broader application of the studied predictive models to other assessments or countries to validate and generalize findings, and to explore additional machine learning techniques.
Downloads
References
Ahmed, E. (2024). Student performance prediction using machine learning algorithms. Applied Computational Intelligence and Soft Computing, 2024(1), 1–15. https://doi.org/10.1155/2024/4067721
Alpaydin, E. (2005). Introduction to machine learning. The Knowledge Engineering Review, 20(4), 432–433. https://doi.org/10.1017/S0269888906220745
Anderson, J., Lin, H., Treagust, D., Ross, S., & Yore, L. (2007). Using large-scale assessment datasets for research in science and mathematics education: Programme for International Student Assessment (PISA). International Journal of Science and Mathematics Education, 5(4), 591–614. https://doi.org/10.1007/s10763-007-9090-y
Arnold, C., Biedebach, L., Kupfer, A., & Neunhoeffer, M. (2024). The role of hyperparameters in machine learning models and how to tune them. Political Science Research and Methods, 12(4), 1–8. https://doi.org/10.1017/psrm.2023.61
Baldi, P., Brunak, S., Chauvin, Y., Andersen, C., & Nielsen, H. (2000). Assessing the accuracy of prediction algorithms for classification: An overview. Bioinformatics, 16(5), 412–424.
Barnett, V., & Lewis, T. (1994). Outliers in statistical data (3rd ed.). John Wiley & Sons.
Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford University Press.
Bishop, C. M. (2006). Pattern recognition and machine learning. Springer.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
Bulut, O. (2020, December 14). Explanatory IRT models in R. https://okan.cloud/posts/2020-12-14-explanatory-irt-models-in-r/
De Boeck, P., & Wilson, M. (2004). Explanatory item response models: A generalized linear and nonlinear approach. Springer. https://doi.org/10.1007/978-1-4757-3990-9
Efron, B. (2013). Bayes’ theorem in the 21st century. Science, 340(6137), 1177–1178.
https://doi.org/10.1126/science.1236536
Gonzalez, O. (2021). Psychometric and machine learning approaches for diagnostic assessment and tests of individual classification. Psychological Methods, 26(2), 236–254. https://doi.org/10.1037/met0000317
Gupta, R., Sharma, A., & Alam, T. (2024). Building predictive models with machine learning. In P. Singh, A. R. Mishra, & P. Garg (Eds.), Data analytics and machine learning (pp. 39–59). Springer. https://doi.org/10.1007/978-981-97-0448-4_3
Halder, R., Uddin, M., Uddin, M., Aryal, S., & Khraisat, A. (2024). Enhancing K-nearest neighbor algorithm: A comprehensive review and performance analysis of modifications. Journal of Big Data, 11(1), 1–55. https://doi.org/10.1186/s40537-024-00973-y
Hambleton, R., Swaminathan, H., & Rogers, H. (1991). Fundamentals of item response theory. Sage Publications.
Haykin, S. (2009). Neural networks and learning machines (3rd ed.). Pearson Education.
Khor, E. (2019). Predictive models with machine learning algorithms to forecast students’ performance. In Proceedings of the 13th International Technology, Education and Development Conference (pp. 2831–2837). https://doi.org/10.21125/inted.2019.0757
Kilimci, Z., & Ganiz, M. (2015). Evaluation of classification models for language processing. 2015 International Symposium on Innovations in Intelligent Systems and Applications (INISTA) (pp. 1–8). https://doi.org/10.1109/INISTA.2015.7276787
Kim, Y., Gutierrez, N., & Petscher, Y. (2024). Decomposing variation in vocabulary and listening comprehension task performance in Spanish and English into person, ecological, and assessment differences for Spanish-English bilingual children in the United States. Journal of Speech, Language, and Hearing Research, 67(10), 3733–3747. https://doi.org/10.1044/2024_JSLHR-23-00702
Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R News, 2(3), 18–22.
Linden, W. (2016). Handbook of item response theory volume one: Models. Chapman and Hall/CRC. https://doi.org/10.1201/9781315374512
Maass, W., Parsons, J., Purao, S., Storey, V., & Woo, C. (2018). Data-driven meets theory-driven research in the era of big data: Opportunities and challenges for information system research. Journal of the Association for Information Systems, 19(12), 1253–1273. http://dx.doi.org/10.17705/1jais.00526
McCulloch, C., & Searle, S. (2001). Generalized, linear, and mixed models. Wiley & Sons.
Mohri, M., Rostamizadeh, A., & Talwalkar, A. (2012). Foundations of machine learning (Adaptive Computation and Machine Learning Series). MIT Press.
Moukhafi, M., El Yassini, K., & Seddik, B. (2020). Intrusions detection using optimized support vector machine. International Journal of Advances in Applied Sciences (IJAAS), 9(1), 62–66.
Nguyen, Q., Ly, H., Ho, L., Al-Ansari, N., Le, H., Tran, V., Prakash, I., & Pham, B. (2021). Influence of data splitting on performance of machine learning models in prediction of shear strength of soil. Mathematical Problems in Engineering, 2021, Article 4832864, 1–15. https://doi.org/10.1155/2021/4832864
OECD. (2023a). PISA 2022 results (Volume I): The state of learning and equity in education. OECD Publishing. https://doi.org/10.1787/53f23881-en
OECD. (2023b). “Foreword,” in PISA 2022 assessment and analytical framework. OECD Publishing. https://doi.org/10.1787/dfe0bf9c-en
OECD. (2023e). “The PISA target population, the PISA samples, and the definition of schools,” in PISA 2022 results (Volume I): The state of learning and equity in education. OECD Publishing. https://doi.org/10.1787/53f23881-en
OECD. (2023f). PISA 2022 technical report. OECD Publishing. https://doi.org/10.1787/01820d6d-en
Osmanbegovic, E., & Suljic, M. (2012). Data mining approach for predicting student performance. Economic Review - Journal of Economics and Business, 10(1), 3–12.
Pachouly, S., & Bormance, D. (2025). Exploring the predictive power of explainable AI in student performance forecasting using educational data. In D. Goyal (Ed.), Recent advances in sciences, engineering, information technology & management (pp. 362–370). CRC Press.
Park, J., Dedja, K., Pliakos, K., Kim, J., Joo, S., Cornillie, F., Vens, C., & Noortgate, W. (2023). Comparing the prediction performance of item response theory and machine learning methods on item responses for educational assessments. Behavior Research Methods, 55(4), 2109–2124. https://doi.org/10.3758/s13428-022-01910-8
Pliakos, K., Joo, S., Park, J., Cornillie, F., Vens, C., & Noortgate, W. (2019). Integrating machine learning into item response theory for addressing the cold start problem in adaptive learning systems. Computers & Education, 137, 91–103. https://doi.org/10.1016/j.compedu.2019.04.009
Shmueli, G. (2010). To explain or to predict? Statistical Science, 25(3), 289–310. https://doi.org/10.1214/10-STS330
Sugiyama, M. (2015). Introduction to statistical machine learning. Morgan Kaufmann.
Swaminathan, S., & Tantri, B. (2024). Confusion matrix-based performance evaluation metrics. African Journal of Biomedical Research, 27(4), 4023–4031. https://doi.org/10.53555/AJBR.v27i4S.4345
Theobald, O. (2017). Machine learning for absolute beginners. Oliver Theobald.
Tomar, P., & Verma, S. (2021). Impact and role of AI technologies in teaching, learning, and research in higher education. In S. Verma & P. Tomar (Eds.), Impact of AI technologies on teaching, learning, and research in higher education (pp. 190–203). IGI Global. https://doi.org/10.4018/978-1-7998-4763-2.ch012
Vapnik, V. (1995). The nature of statistical learning. Springer. http://dx.doi.org/10.1007/978-1-4757-2440-0
Vijayalakshmi, V., & Venkatachalapathy, K. (2019). Comparison of predicting student’s performance using machine learning algorithms. International Journal of Intelligent Systems and Applications, 11(12), 34–45. https://doi.org/10.5815/ijisa.2019.12.04
Wilson, M., De Boeck, P., & Carstensen, C. (2006). Explanatory item response models: A brief introduction. Hogrefe & Huber Publishers.
Witten, I., & Frank, E. (2000). Data mining – Practical machine learning tools and techniques (2nd ed.). Morgan Kaufmann.
Youssef, Y. (2022). Bayes theorem and real-life application. Cairo University, Faculty of Economic and Political Science, Socio-Computing Department.
Zhang, Z. (2016). Naïve Bayes classification in R. Annals of Translational Medicine, 4(12), 241.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Dirasat: Educational Sciences

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Accepted 2025-05-28
Published 2025-06-29
