Predictive Accuracy of Models for Evaluating Student Performance in PISA Mathematics ‎Test in Jordan Using Explanatory Item Response Models and Machine Learning Models: A ‎Comparative Study

Authors

DOI:

https://doi.org/10.35516/Edu.2025.10870

Keywords:

Explanatory Item Response Models, Machine Learning ‎Models‎, Predictive Accuracy‎, PISA test.‎

Abstract

Objectives: The study aimed to compare the predictive accuracy of student performance assessment models on the PISA 2022 mathematics test in Jordan using explanatory item response models (EIRMs) and machine learning models: Random Forest, Artificial Neural Networks, Naïve Bayes, Support Vector Machine, and K-Nearest Neighbor.

Methods: A descriptive analysis method was used, based on data from 7,799 Jordanian students randomly selected from 260 schools that participated in the test. Ten-fold cross-validation was employed to compare model predictive accuracy. Predictive variables included item difficulty, and student-related factors: gender, supervisory authority, socioeconomic status, bullying, use of digital applications outside school, availability of internet-connected devices in schools, teachers’ digital skills, and use of digital resources in math classes.

Results: The Naïve Bayes model achieved the highest predictive accuracy (0.718), while the EIRM showed strong discriminatory power with an AUC of 0.693, outperforming machine learning models in distinguishing between student responses. Item difficulty emerged as the most influential predictor.

Conclusions: The study recommends further research incorporating new variables and broader application of the studied predictive models to other assessments or countries to validate and generalize findings, and to explore additional machine learning techniques.

Downloads

Download data is not yet available.

References

Ahmed, E. (2024). Student performance prediction using machine learning algorithms. Applied Computational ‎Intelligence and Soft Computing, 2024(1), 1–15. https://doi.org/10.1155/2024/4067721‎

Alpaydin, E. (2005). Introduction to machine learning. The Knowledge Engineering Review, 20(4), 432–433. ‎https://doi.org/10.1017/S0269888906220745‎

Anderson, J., Lin, H., Treagust, D., Ross, S., & Yore, L. (2007). Using large-scale assessment datasets for research in ‎science and mathematics education: Programme for International Student Assessment (PISA). International ‎Journal of Science and Mathematics Education, 5(4), 591–614. https://doi.org/10.1007/s10763-007-9090-y

Arnold, C., Biedebach, L., Kupfer, A., & Neunhoeffer, M. (2024). The role of hyperparameters in machine learning ‎models and how to tune them. Political Science Research and Methods, 12(4), 1–8. ‎https://doi.org/10.1017/psrm.2023.61‎

Baldi, P., Brunak, S., Chauvin, Y., Andersen, C., & Nielsen, H. (2000). Assessing the accuracy of prediction algorithms ‎for classification: An overview. Bioinformatics, 16(5), 412–424.‎

Barnett, V., & Lewis, T. (1994). Outliers in statistical data (3rd ed.). John Wiley & Sons.‎

Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford University Press.‎

Bishop, C. M. (2006). Pattern recognition and machine learning. Springer.‎

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324‎

Bulut, O. (2020, December 14). Explanatory IRT models in R. https://okan.cloud/posts/2020-12-14-explanatory-irt-‎models-in-r/‎

De Boeck, P., & Wilson, M. (2004). Explanatory item response models: A generalized linear and nonlinear approach. ‎Springer. https://doi.org/10.1007/978-1-4757-3990-9‎

Efron, B. (2013). Bayes’ theorem in the 21st century. Science, 340(6137), 1177–1178. ‎

https://doi.org/10.1126/science.1236536‎

Gonzalez, O. (2021). Psychometric and machine learning approaches for diagnostic assessment and tests of individual ‎classification. Psychological Methods, 26(2), 236–254. https://doi.org/10.1037/met0000317‎

Gupta, R., Sharma, A., & Alam, T. (2024). Building predictive models with machine learning. In P. Singh, A. R. Mishra, ‎& P. Garg (Eds.), Data analytics and machine learning (pp. 39–59). Springer. https://doi.org/10.1007/978-981-97-‎‎0448-4_3‎

Halder, R., Uddin, M., Uddin, M., Aryal, S., & Khraisat, A. (2024). Enhancing K-nearest neighbor algorithm: A ‎comprehensive review and performance analysis of modifications. Journal of Big Data, 11(1), 1–55. ‎https://doi.org/10.1186/s40537-024-00973-y

Hambleton, R., Swaminathan, H., & Rogers, H. (1991). Fundamentals of item response theory. Sage Publications.‎

Haykin, S. (2009). Neural networks and learning machines (3rd ed.). Pearson Education.‎

Khor, E. (2019). Predictive models with machine learning algorithms to forecast students’ performance. In Proceedings ‎of the 13th International Technology, Education and Development Conference (pp. 2831–2837). ‎https://doi.org/10.21125/inted.2019.0757‎

Kilimci, Z., & Ganiz, M. (2015). Evaluation of classification models for language processing. 2015 International ‎Symposium on Innovations in Intelligent Systems and Applications (INISTA) (pp. 1–8). ‎https://doi.org/10.1109/INISTA.2015.7276787‎

Kim, Y., Gutierrez, N., & Petscher, Y. (2024). Decomposing variation in vocabulary and listening comprehension task ‎performance in Spanish and English into person, ecological, and assessment differences for Spanish-English ‎bilingual children in the United States. Journal of Speech, Language, and Hearing Research, 67(10), 3733–3747. ‎https://doi.org/10.1044/2024_JSLHR-23-00702‎

Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R News, 2(3), 18–22.‎

Linden, W. (2016). Handbook of item response theory volume one: Models. Chapman and Hall/CRC. ‎https://doi.org/10.1201/9781315374512‎

Maass, W., Parsons, J., Purao, S., Storey, V., & Woo, C. (2018). Data-driven meets theory-driven research in the era of ‎big data: Opportunities and challenges for information system research. Journal of the Association for Information ‎Systems, 19(12), 1253–1273. http://dx.doi.org/10.17705/1jais.00526‎

McCulloch, C., & Searle, S. (2001). Generalized, linear, and mixed models. Wiley & Sons.‎

Mohri, M., Rostamizadeh, A., & Talwalkar, A. (2012). Foundations of machine learning (Adaptive Computation and ‎Machine Learning Series). MIT Press.‎

Moukhafi, M., El Yassini, K., & Seddik, B. (2020). Intrusions detection using optimized support vector machine. ‎International Journal of Advances in Applied Sciences (IJAAS), 9(1), 62–66.‎

Nguyen, Q., Ly, H., Ho, L., Al-Ansari, N., Le, H., Tran, V., Prakash, I., & Pham, B. (2021). Influence of data splitting on ‎performance of machine learning models in prediction of shear strength of soil. Mathematical Problems in ‎Engineering, 2021, Article 4832864, 1–15. https://doi.org/10.1155/2021/4832864‎

OECD. (2023a). PISA 2022 results (Volume I): The state of learning and equity in education. OECD Publishing. ‎https://doi.org/10.1787/53f23881-en

OECD. (2023b). “Foreword,” in PISA 2022 assessment and analytical framework. OECD Publishing. ‎https://doi.org/10.1787/dfe0bf9c-en

OECD. (2023e). “The PISA target population, the PISA samples, and the definition of schools,” in PISA 2022 results ‎‎(Volume I): The state of learning and equity in education. OECD Publishing. https://doi.org/10.1787/53f23881-en

OECD. (2023f). PISA 2022 technical report. OECD Publishing. https://doi.org/10.1787/01820d6d-en

Osmanbegovic, E., & Suljic, M. (2012). Data mining approach for predicting student performance. Economic Review - ‎Journal of Economics and Business, 10(1), 3–12.‎

Pachouly, S., & Bormance, D. (2025). Exploring the predictive power of explainable AI in student performance ‎forecasting using educational data. In D. Goyal (Ed.), Recent advances in sciences, engineering, information ‎technology & management (pp. 362–370). CRC Press.‎

Park, J., Dedja, K., Pliakos, K., Kim, J., Joo, S., Cornillie, F., Vens, C., & Noortgate, W. (2023). Comparing the prediction ‎performance of item response theory and machine learning methods on item responses for educational ‎assessments. Behavior Research Methods, 55(4), 2109–2124. https://doi.org/10.3758/s13428-022-01910-8‎

Pliakos, K., Joo, S., Park, J., Cornillie, F., Vens, C., & Noortgate, W. (2019). Integrating machine learning into item ‎response theory for addressing the cold start problem in adaptive learning systems. Computers & Education, 137, ‎‎91–103. https://doi.org/10.1016/j.compedu.2019.04.009‎

Shmueli, G. (2010). To explain or to predict? Statistical Science, 25(3), 289–310. https://doi.org/10.1214/10-STS330‎

Sugiyama, M. (2015). Introduction to statistical machine learning. Morgan Kaufmann.‎

Swaminathan, S., & Tantri, B. (2024). Confusion matrix-based performance evaluation metrics. African Journal of ‎Biomedical Research, 27(4), 4023–4031. https://doi.org/10.53555/AJBR.v27i4S.4345‎

Theobald, O. (2017). Machine learning for absolute beginners. Oliver Theobald.‎

Tomar, P., & Verma, S. (2021). Impact and role of AI technologies in teaching, learning, and research in higher ‎education. In S. Verma & P. Tomar (Eds.), Impact of AI technologies on teaching, learning, and research in higher ‎education (pp. 190–203). IGI Global. https://doi.org/10.4018/978-1-7998-4763-2.ch012‎

Vapnik, V. (1995). The nature of statistical learning. Springer. http://dx.doi.org/10.1007/978-1-4757-2440-0‎

Vijayalakshmi, V., & Venkatachalapathy, K. (2019). Comparison of predicting student’s performance using machine ‎learning algorithms. International Journal of Intelligent Systems and Applications, 11(12), 34–45. ‎https://doi.org/10.5815/ijisa.2019.12.04‎

Wilson, M., De Boeck, P., & Carstensen, C. (2006). Explanatory item response models: A brief introduction. Hogrefe & ‎Huber Publishers.‎

Witten, I., & Frank, E. (2000). Data mining – Practical machine learning tools and techniques (2nd ed.). Morgan ‎Kaufmann.‎

Youssef, Y. (2022). Bayes theorem and real-life application. Cairo University, Faculty of Economic and Political ‎Science, Socio-Computing Department.‎

Zhang, Z. (2016). Naïve Bayes classification in R. Annals of Translational Medicine, 4(12), 241. ‎

https://doi.org/10.21037/atm.2016.03.38‎

Downloads

Published

2025-06-29

How to Cite

Abu Rashed, H. H. A., & Al-Shraifin, N. K. M. (2025). Predictive Accuracy of Models for Evaluating Student Performance in PISA Mathematics ‎Test in Jordan Using Explanatory Item Response Models and Machine Learning Models: A ‎Comparative Study. Dirasat: Educational Sciences, 52(3), 10870. https://doi.org/10.35516/Edu.2025.10870

Issue

Section

Curriculum and Instruction
Received 2025-02-20
Accepted 2025-05-28
Published 2025-06-29