A Cross-Semester Analysis of the Quality of Recurring Items: Item Difficulty, Discrimination, and Distractor Functioning

Authors

DOI:

https://doi.org/10.56380/mjer.si.2026.05-14

Keywords:

General Chemistry II, Recurring items, Item difficulty, Item discrimination, Distractor functioning

Abstract

The purpose of this study was to evaluate the quality of recurring multiple-choice items used in formative assessments across three semesters of General Chemistry II (CHEM202), using item difficulty, item discrimination, and distractor functioning as key indicators, and to identify patterns of change across semesters. The study drew on data from Fall 2024–2025, Spring 2024–2025, and Fall 2025–2026, and analyzed a total of 41 recurring items. Exploratory follow-up analyses were conducted for 13 items that underwent change. The data were analyzed using the Friedman test, Mann–Whitney U test, independent-samples t test, paired-samples t test, Wilcoxon signed-rank test, Pearson’s χ² test, and binomial generalized linear models. The findings showed that item difficulty changed significantly across the three semesters, whereas item discrimination remained relatively stable overall. However, between Spring 2024–2025 and Fall 2025–2026, the discrimination of modified items improved more than that of unchanged items. Item-level analyses further indicated that changes in quality were not uniform across all items, but instead emerged in item-specific ways. These findings suggest that the quality of multiple-choice items should not be evaluated solely on the basis of one-time performance indicators; rather, recurring item tracking should be integrated with distractor analysis.

Downloads

Download data is not yet available.

References

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. American Educational Research Association.

Adegoke, B. A., Nwosu, A. A., & Okoye, N. N. S. (2018). Item analysis of university-wide multiple-choice objective examinations: The experience of a Nigerian private university. SpringerPlus, 7(1), 1–10. https://doi.org/10.1186/s40064-018-2006-5

Ali, S. H., Carr, P. A., & Ruit, K. G. (2016). Validity and reliability of scores obtained on multiple-choice questions: Why functioning distractors matter. Journal of the Scholarship of Teaching and Learning, 16(1), 1–14. https://doi.org/10.14434/josotl.v16i1.19106

Barbera, J., & VandenPlas, J. R. (2011). Clarity on Cronbach’s alpha use. Journal of Chemical Education, 88(12), 1571–1572. https://doi.org/10.1021/ed3004353

Breakall, J., Randles, C., & Tasker, R. (2019). Development and use of a multiple-choice item writing flaws evaluation instrument in the context of general chemistry. Chemistry Education Research and Practice, 20(2), 369–382. https://doi.org/10.1039/C8RP00262B

Clark, T. M., Turner, D. A., & Rostam, D. C. (2022). Evaluating and improving questions on an unproctored online general chemistry exam. Journal of Chemical Education, 99(10), 3510–3521. https://doi.org/10.1021/acs.jchemed.2c00603

Demircioğlu, H., & Demircioğlu, G. (2016). Analysis of the difficulty and discrimination indices of multiple-choice questions. Eurasian Journal of Educational Research, 16(64), 1–20. https://doi.org/10.14689/ejer.2016.64.1

Downing, S. M. (2004). Reliability: On the reproducibility of assessment data. Medical Education, 38(9), 1006–1012. https://doi.org/10.1111/j.1365-2929.2004.01932.x

Fan, X. (1998). Item response theory and classical test theory: An empirical comparison of their item/person statistics. Educational and Psychological Measurement, 58(3), 357–381. https://doi.org/10.1177/0013164498058003001

Gierl, M. J., Bulut, O., Guo, Q., & Zhang, X. (2017). Developing, analyzing, and using distractors for multiple-choice tests in education: A comprehensive review. Review of Educational Research, 87(6), 1082–1116. https://doi.org/10.3102/0034654317726529

Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied Measurement in Education, 15(3), 309–333. https://doi.org/10.1207/S15324818AME1503_5

Hartman, J. R., & Lin, S. (2011). Analysis of student performance on multiple-choice questions in general chemistry. Journal of Chemical Education, 88(9), 1223–1230. https://doi.org/10.1021/ed100133v

Krishnan, V. (2010). An item analysis using Classical Test Theory (CTT) on Alberta’s data. University of Alberta, Community-University Partnership.

Mango, C. (2009). Demonstrating the difference between Classical Test Theory and Item Response Theory using chemistry test data. International Journal of Educational and Psychological Assessment, 1(1), 1–16.

Muangkhoua, S. (2017). Test analysis with Classical Test Theory (CTT): Finding item difficulty and item discrimination using SPSS program. Vajira Medical Journal: Journal of Urban Medicine, 61(6), 477–486. https://doi.org/10.14456/vmj.2017.46

National Research Council. (2001). Knowing what students know: The science and design of educational assessment. National Academies Press. https://doi.org/10.17226/10019

Novick, M. R. (1966). The axioms and principal results of classical test theory. Journal of Mathematical Psychology, 3(1), 1–18. https://doi.org/10.1016/0022-2496(66)90002-2

Rezigalla, A. A., Eleragi, S. A., Elhussein, A. B., Alfaifi, J., AlGhamdi, M. A., Al Ameer, A. Y., & Adam, M. I. E. (2024). Item analysis: The impact of distractor efficiency on the difficulty index and discrimination power of multiple-choice items. BMC Medical Education, 24(1), Article 445. https://doi.org/10.1186/s12909-024-05433-y

Rodriguez, M. C. (2005). Three options are optimal for multiple-choice items: A meta-analysis of 80 years of research. Educational Measurement: Issues and Practice, 24(2), 3–13. https://doi.org/10.1111/j.1745-3992.2005.00006.x

Schauber, S. K., Hecht, M., & Nouns, Z. M. (2010). Rarely selected distractors in high-stakes medical multiple-choice examinations. BMC Medical Education, 10(1), Article 85. https://doi.org/10.1186/1472-6920-10-85

Sorenson, B., & Hanson, K. (2021). Using classical test theory and Rasch modeling to improve general chemistry exams on a per instructor basis. Journal of Chemical Education, 98(5), 1529–1538. https://doi.org/10.1021/acs.jchemed.1c00164

Sulistyo, G. H., Mukminatien, N., & Saukah, A. (2020). Item analysis of English final semester test. Indonesian Journal of EFL and Linguistics, 5(2), 519–536. https://doi.org/10.21462/ijefl.v5i2.302

Tarrant, M., Ware, J., & Mohammed, A. M. (2009). An assessment of functioning and non-functioning distractors in multiple-choice questions: A descriptive analysis. BMC Medical Education, 9, Article 40. https://doi.org/10.1186/1472-6920-9-40

Towns, M. H. (2014). Guide to developing high-quality, reliable, and valid multiple-choice assessments. Journal of Chemical Education, 91(9), 1426–1431. https://doi.org/10.1021/ed500076x

Xiaoqiong, H., & Xianghong, T. (2022). Formative assessment and exam culture in East Asian Confucian-heritage contexts. Frontiers in Psychology, 13, Article 990196. https://doi.org/10.3389/fpsyg.2022.990196

Zenisky, A. L., Hambleton, R. K., & Sireci, S. G. (2013). Individual score reports on NAEP: Design, reporting, and interpretability. Educational Measurement: Issues and Practice, 32(4), 17–27. https://doi.org/10.1111/emip.12017

Published

2026-05-21

How to Cite

A Cross-Semester Analysis of the Quality of Recurring Items: Item Difficulty, Discrimination, and Distractor Functioning. (2026). Mongolian Journal of Educational Research. https://doi.org/10.56380/mjer.si.2026.05-14