Assessing Educational Instrument Validity-Reliability and Student Performance: Rasch Model Insights
DOI:
https://doi.org/10.37134/ajatel.vol16.1.2.2026Keywords:
assessment, instruments, performance, rasch analysis, reliability, validityAbstract
This study aims to assess the validity-reliability and student performance. A cross-sectional research design with a quantitative method was employed in this study. The participants of this research were 34 students divided into 12 males (35.3%) and 22 females (64.7%) with an age range of 16-17 years. The data collection used a Quizizz with 15 questions online but still under teacher supervision. The results demonstrate moderate reliability and validity per Rasch analysis, with a person reliability score of 0.63 and an item reliability score of 0.46, indicating a need for item refinement. The person separation index (1.31) and item separation index (0.93) show moderate differentiation, while a Cronbach alpha of 0.70 and raw variance explained by measures (41.6%) support instrument validity. The characteristics of the items and persons reveal that most items align well with the model expectations, although some exhibit variability or inconsistencies that need addressing.
Downloads
References
Abdellatif, H. (2023). Test results with and without blueprinting:Psychometric analysis using the Rasch model. Educación Médica, 24(3), 100802. https://doi.org/10.1016/j.edumed.2023.100802
Abdi, H., & Williams, L. J. (2010). Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2(4), 433–459. https://doi.org/10.1002/wics.101
Ahmed, V., Olanipekun, A., Opoku, A., & Sutrisna, M. (2022). Understanding reliability in research. In Validity and Reliability in Built Environment Research (pp. 3–15). Routledge. https://doi.org/10.1201/9780429243226-2
Akbulut, Y., & Cardak, C. S. (2012). Adaptive educational hypermedia accommodating learning styles: A content analysis of publications from 2000 to 2011. Computers & Education, 58(2), 835–842. https://doi.org/10.1016/j.compedu.2011.10.008
Akkoyunlu, B., & Soylu, M. Y. (2008). A study of student’s perceptions in a blended learning environment based on different learning styles. Journal of Educational Technology & Society, 11(1), 183–193. https://doi.org/10.1016/j.iheduc.2007.12.006
Al-Hawamdeh, B. O. S., Hussen, N., & Abdelrasheed, N. S. G. (2023). Portfolio vs. summative assessment: impacts on EFL learners’ writing complexity, accuracy, and fluency (CAF); self-efficacy; learning anxiety; and autonomy. Language Testing in Asia, 13(1), 12. https://doi.org/10.1186/s40468-023-00225-5
Al-Sagarat, A. Y., Yaghmour, G., & Moxham, L. (2017). Intentions and barriers toward breastfeeding among Jordanian mothers—A cross sectional descriptive study using quantitative method. Women and Birth, 30(4), e152–e157. https://doi.org/10.1016/j.wombi.2016.11.001
Amiruddin, M. Z. Bin, Samsudin, A., Suhandi, A., Kaniawati, I., COŞTU, B., Aminuddin, A. H., & Kuniawan, F. (2023). Validity and Reliability of the Global Warming Instrument: A Pilot Study Using Rasch Model Analysis. Jurnal Pendidikan MIPA, 24(4), 912–922. https://doi.org/10.23960/jpmipa/v24i4.pp912-922
Arifin, Z., & Setiawan, B. (2022). Utilising Gamification for Online Evaluation through Quizizz: Teachers’ Perspectives and Experiences. World Journal on Educational Technology: Current Issues, 14(3), 781–796. https://doi.org/10.18844/wjet.v14i3.7278
Arnold, J. C., Boone, W. J., Kremer, K., & Mayer, J. (2018). Assessment of competencies in scientific inquiry through the application of Rasch measurement techniques. Education Sciences, 8(4), 184. https://doi.org/10.3390/educsci8040184
Bartholomew, D. J., Knott, M., & Moustaki, I. (2011). Latent variable models and factor analysis: A unified approach. John Wiley & Sons. https://doi.org/10.1002/9781119970583
Behi, R., & Nolan, M. (1995). Reliability: consistency and accuracy in measurement. British Journal of Nursing, 4(8), 472–475. https://doi.org/10.12968/bjon.1995.4.8.472
Bennett, R. E. (2011). Formative assessment: A critical review. Assessment in Education: Principles, Policy & Practice, 18(1), 5–25. https://doi.org/10.1080/0969594X.2010.513678
Bienkowski, M., Feng, M., & Means, B. (2012). Enhancing Teaching and Learning through Educational Data Mining and Learning Analytics: An Issue Brief. Office of Educational Technology, US Department of Education.
Boone, W. J., & Noltemeyer, A. (2017). Rasch analysis: A primer for school psychology researchers and practitioners. Cogent Education, 4(1), 1416898. https://doi.org/10.1080/2331186X.2017.1416898
Boone, W. J., Staver, J. R., Yale, M. S., Boone, W. J., Staver, J. R., & Yale, M. S. (2014). Wright maps: First steps. Rasch Analysis in the Human Sciences, 111–136. https://doi.org/10.1007/978-94-007-6857-4_6
Borsboom, D., Mellenbergh, G. J., & Van Heerden, J. (2004). The concept of validity. Psychological Review, 111(4), 1061. https://doi.org/10.1037/0033-295X.111.4.1061
Bradburn, N. M., Sudman, S., & Wansink, B. (2004). Asking questions: the definitive guide to questionnaire design--for market research, political polls, and social and health questionnaires. John Wiley & Sons.
Brochado, A. (2009). Comparing alternative instruments to measure service quality in higher education. Quality Assurance in Education, 17(2), 174–190. https://doi.org/10.1108/09684880910951381
Brookhart, S. M. (2011). Educational assessment knowledge and skills for teachers. Educational Measurement: Issues and Practice, 30(1), 3–12. https://doi.org/10.1111/j.1745-3992.2010.00195.x
Care, E., Kim, H., Vista, A., & Anderson, K. (2018). Education System Alignment for 21st Century Skills: Focus on Assessment. Center for Universal Education at The Brookings Institution.
Chalmers, D. (2007). A review of Australian and international quality systems and indicators of learning and teaching. Carrick Institute for Learning and Teaching in Higher Education, 1(2), 1–122.
Chan, S.-W., Looi, C.-K., & Sumintono, B. (2021). Assessing computational thinking abilities among Singapore secondary students: a Rasch model measurement analysis. Journal of Computers in Education, 8, 213–236. https://doi.org/10.1007/s40692-020-00177-2
Cohen, D., & Sasson, I. (2016). Online quizzes in a virtual learning environment as a tool for formative assessment. Journal of Technology and Science Education (JOTSE), 6(3), 188–208.
Cohen, L., Manion, L., & Morrison, K. (2017). Validity and reliability. In Research methods in education (pp. 245–284). Routledge. https://doi.org/10.4324/9781315456539-14
Colville, G., Darkins, J., Hesketh, J., Bennett, V., Alcock, J., & Noyes, J. (2009). The impact on parents of a child’s admission to intensive care: Integration of qualitative findings from a cross-sectional study. Intensive and Critical Care Nursing, 25(2), 72–79. https://doi.org/10.1016/j.iccn.2008.10.002
Cook, D. A., & Beckman, T. J. (2006). Current concepts in validity and reliability for psychometric instruments: theory and application. The American Journal of Medicine, 119(2), 166-e7. https://doi.org/10.1016/j.amjmed.2005.10.036
Creswell, J. W., & Creswell, J. D. (2017). Research design: Qualitative, quantitative, and mixed methods approaches. Sage publications.
Darman, D. R., Suhandi, A., Kaniawati, I., Samsudin, A., & Wibowo, F. C. (2024). Development and Validation of Scientific Inquiry Literacy Instrument (SILI) Using Rasch Measurement Model. Education Sciences, 14(3), 322. https://doi.org/10.3390/educsci14030322
De Castella, K., & Byrne, D. (2015). My intelligence may be more malleable than yours: The revised implicit theories of intelligence (self-theory) scale is a better predictor of achievement, motivation, and student disengagement. European Journal of Psychology of Education, 30, 245–267. https://doi.org/10.1007/s10212-015-0244-y
Dunn, K. E., Airola, D. T., Lo, W.-J., & Garrison, M. (2013). What teachers think about what they can do with data: Development and validation of the data driven decision-making efficacy and anxiety inventory. Contemporary Educational Psychology, 38(1), 87–98. https://doi.org/10.1016/j.cedpsych.2012.11.002
Feldman, J. M., & Lynch, J. G. (1988). Self-generated validity and other effects of measuremen
https://doi.org/10.1037/0021-9010.73.3.421on belief, attitude, intention, and behavior. Journal of Applied Psychology, 73(3), 421.
Fink, A., & Litwin, M. S. (1995). How to measure survey reliability and validity (Vol. 7). Sage. https://doi.org/10.4135/9781483348957
Floyd, F. J., & Widaman, K. F. (1995). Factor analysis in the development and refinement of clinical assessment instruments. Psychological Assessment, 7(3), 286. https://doi.org/10.1037/1040-3590.7.3.286
Galy, E., Downey, C., & Johnson, J. (2011). The effect of using e-learning tools in online and campus-based classrooms on student performance. Journal of Information Technology Education: Research, 10(1), 209–230. https://doi.org/10.28945/1503
Garrido, L. E., Abad, F. J., & Ponsoda, V. (2013). A new look at Horn’s parallel analysis with ordinal variables. Psychological Methods, 18(4), 454. https://doi.org/10.1037/a0030005
Gikandi, J. W., Morrow, D., & Davis, N. E. (2011). Online formative assessment in higher education: A review of the literature. Computers & Education, 57(4), 2333–2351. https://doi.org/10.1016/j.compedu.2011.06.004
Golafshani, N. (2003). Understanding reliability and validity in qualitative research. The Qualitative Report, 8(4), 597–607.
Green, S. B., Levy, R., Thompson, M. S., Lu, M., & Lo, W.-J. (2012). A proposed solution to the problem with using completely random data to assess the number of factors with parallel analysis. Educational and Psychological Measurement, 72(3), 357–374. https://doi.org/10.1177/0013164411422252
Groves, R. M. (1987). Research on survey data quality. The Public Opinion Quarterly, 51, S156–S172. https://doi.org/10.1086/269077
Henseler, J., Ringle, C. M., & Sarstedt, M. (2015). A new criterion for assessing discriminant validity in variance-based structural equation modeling. Journal of the Academy of Marketing Science, 43, 115–135. https://doi.org/10.1007/s11747-014-0403-8
Hitt, D. H., & Tucker, P. D. (2016). Systematic review of key leader practices found to influence student achievement: A unified framework. Review of Educational Research, 86(2), 531–569. https://doi.org/10.3102/0034654315614911
Hora, M. T., Bouwma-Gearhart, J., & Park, H. J. (2017). Data driven decision-making in the era of accountability: Fostering faculty data cultures for learning. The Review of Higher Education, 40(3), 391–426. https://doi.org/10.1353/rhe.2017.0013
Jackson, D. A. (1993). Stopping rules in principal components analysis: a comparison of heuristical and statistical approaches. Ecology, 74(8), 2204–2214. https://doi.org/10.2307/1939574
Johnson, R. B., & Christensen, L. (2019). Educational research: Quantitative, qualitative, and mixeapproaches. Sage publications.
Jonsson, A., & Svingby, G. (2007). The use of scoring rubrics: Reliability, validity and educational consequences. Educational Research Review, 2(2), 130–144. https://doi.org/10.1016/j.edurev.2007.05.002
Karlin, O., & Karlin, S. (2018). Making Better Tests with the Rasch Measurement Model. InSight: A Journal of Scholarly Teaching, 13, 76–100. https://doi.org/10.46504/14201805ka
Kimberlin, C. L., & Winterstein, A. G. (2008). Validity and reliability of measurement instruments used in research. American Journal of Health-System Pharmacy, 65(23), 2276–2284. https://doi.org/10.2146/ajhp070364
Kreijns, K., Bijker, M., & Weidlich, J. (2020). A Rasch analysis approach to the development and validation of a social presence measure. Rasch Measurement: Applications in Quantitative Educational Research, 197–221. https://doi.org/10.1007/978-981-15-1800-3_11
Kulasegaram, K., & Rangachari, P. K. (2018). Beyond “formative”: assessments to enrich student learning. Advances in Physiology Education, 42(1), 5–14. https://doi.org/10.1152/advan.00122.2017
Kyriakides, L., Christoforou, C., & Charalambous, C. Y. (2013). What matters for student learning outcomes: A meta-analysis of studies exploring factors of effective teaching. Teaching and Teacher Education, 36, 143–152. https://doi.org/10.1016/j.tate.2013.07.010
Lewis, B. R., Templeton, G. F., & Byrd, T. A. (2005). A methodology for construct development in MIS research. European Journal of Information Systems, 14(4), 388–400. https://doi.org/10.1057/palgrave.ejis.3000552
Lim, T. M., & Yunus, M. M. (2021). Teachers’ perception towards the use of Quizizz in the teaching and learning of English: A systematic review. Sustainability, 13(11), 6436. https://doi.org/10.3390/su13116436
Martínez-Caro, E., Cegarra-Navarro, J. G., & Cepeda-Carrión, G. (2015). An application of the performance-evaluation model for e-learning quality in higher education. Total Quality Management & Business Excellence, 26(5–6), 632–647. https://doi.org/10.1080/14783363.2013.867607
Massof, R. W. (2005). Application of stochastic measurement models to visual function rating scale questionnaires. Ophthalmic Epidemiology, 12(2), 103–124. https://doi.org/10.1080/09286580590932789
Massof, R. W. (2011). Understanding Rasch and item response theory models: applications to the estimation and validation of interval latent trait measures from responses to rating scale questionnaires. Ophthalmic Epidemiology, 18(1), 1–19. https://doi.org/10.3109/09286586.2010.545501
Massof, R. W., & Rubin, G. S. (2001). Visual function assessment questionnaires. Survey of Ophthalmology, 45(6), 531–548. https://doi.org/10.1016/S0039-6257(01)00194-1
Maydeu-Olivares, A. (2017). Assessing the size of model misfit in structural equation models. Psychometrika, 82(3), 533–558. https://doi.org/10.1007/s11336-016-9552-7
Neumann, I., Neumann, K., & Nehm, R. (2011). Evaluating instrument quality in science education: Rasch‐baseanalyses of a nature of science test. International Journal of Science Education, 33(10), 1373–1405. https://doi.org/10.1080/09500693.2010.511297
Park, M. S., Kang, K. J., Jang, S. J., Lee, J. Y., & Chang, S. J. (2018). Evaluating test-retest reliability in patient-reported outcome measures for older people: A systematic review. International Journal of Nursing Studies, 79, 58–69. https://doi.org/10.1016/j.ijnurstu.2017.11.003
Pellegrino, J. W., & Quellmalz, E. S. (2010). Perspectives on the integration of technology and assessment. Journal of Research on Technology in Education, 43(2), 119–134. https://doi.org/10.1080/15391523.2010.10782565
Pianta, R. C., Hamre, B. K., & Allen, J. P. (2012). Teacher-student relationships and engagement: Conceptualizing, measuring, and improving the capacity of classroom interactions. In Handbook of research on student engagement (pp. 365–386). Springer. https://doi.org/10.1007/978-1-4614-2018-7_17
Planinic, M., Boone, W. J., Susac, A., & Ivanjek, L. (2019). Rasch analysis in physics education research: Why measurement matters. Physical Review Physics Education Research, 15(2), 20111. https://doi.org/10.1103/PhysRevPhysEducRes.15.020111
Radhakrishna, R. B. (2007). Tips for developing and testing questionnaires/instruments. The Journal of Extension, 45(1), 25.
Reise, S. P., Moore, T. M., & Haviland, M. G. (2010). Bifactor models and rotations: Exploring the extent to which multidimensional data yield univocal scale scores. Journal of Personality Assessment, 92(6), 544–559. https://doi.org/10.1080/00223891.2010.496477
Reise, S. P., Waller, N. G., & Comrey, A. L. (2000). Factor analysis and scale revision. Psychological Assessment, 12(3), 287. https://doi.org/10.1037/1040-3590.12.3.287
Shen, J., Cooley, V. E., Ma, X., Reeves, P. L., Burt, W. L., Rainey, J. M., & Yuan, W. (2012). Data-informed decision making on high-impact strategies: Developing and validating an instrument for principals. The Journal of Experimental Education, 80(1), 1–25. https://doi.org/10.1080/00220973.2010.550338
Soeharto, S., & Csapó, B. (2022). Assessing Indonesian student inductive reasoning: Rasch analysis. Thinking Skills and Creativity, 46, 101132. https://doi.org/10.1016/j.tsc.2022.101132
Spooren, P., Brockx, B., & Mortelmans, D. (2013). On the validity of student evaluation of teaching: The state of the art. Review of Educational Research, 83(4), 598–642. https://doi.org/10.3102/0034654313496870
Spooren, P., Mortelmans, D., & Denekens, J. (2007). Student evaluation of teaching quality in higher education: development of an instrument based on 10 Likert‐scales. Assessment & Evaluation in Higher Education, 32(6), 667–679. https://doi.org/10.1080/02602930601117191
Stiggins, R. (2010). Essential formative assessment competencies for teachers and school leaders. In Handbook of formative assessment (pp. 233–250). Routledge.
Stigler, J. W., & Hiebert, J. (2009). The teaching gap: Best ideas from the world’s teachers for improving education in the classroom. Simon and Schuster.
Sürücü, L., & Maslakci, A. (2020). Validity and reliability in quantitative research. Business & Management Studies: An International Journal, 8(3), 2694–2726. https://doi.org/10.15295/bmij.v8i3.1540
Suskie, L. (2018). Assessing student learning: A common sense guide. John Wiley & Sons.
Talib, A. M., Alomary, F. O., & Alwadi, H. F. (2018). Assessment of student performance for course examination using Rasch measurement model: A case study of information technology fundamentals course. Education Research International, 2018(1), 8719012. https://doi.org/10.1155/2018/8719012
Tennant, A., & Conaghan, P. G. (2007). The Rasch measurement model in rheumatology: what is it and why use it? When should it be applied, and what should one look for in a Rasch paper? Arthritis Care & Research, 57(8), 1358–1362. https://doi.org/10.1002/art.23108
Törmäkangas, K. (2011). Advantages of the Rasch measurement model in analysing educational tests: an applicator’s reflection. Educational Research and Evaluation, 17(5), 307–320. https://doi.org/10.1080/13803611.2011.630562
Van Iddekinge, C. H., Roth, P. L., Raymark, P. H., & Odle-Dusseau, H. N. (2012). The criterion-related validity of integrity tests: An updated meta-analysis. Journal of Applied Psychology, 97(3), 499. https://doi.org/10.1037/a0021196
Vaske, J. J., Beaman, J., & Sponarski, C. C. (2017). Rethinking internal consistency in Cronbach’s alpha. Leisure Sciences, 39(2), 163–173. https://doi.org/10.1080/01490400.2015.1127189
Viladrich, C., Angulo-Brunet, A., & Doval, E. (2017). A journey around alpha and omega to estimate internal consistency reliability. Anales de Psicología, 33(3), 755–782. https://doi.org/10.6018/analesps.33.3.268401
Wiliam, D., Lee, C., Harrison, C., & Black, P. (2004). Teachers developing assessment for learning: Impact on student achievement. Assessment in Education: Principles, Policy & Practice, 11(1), 49–65. https://doi.org/10.1080/0969594042000208994
Wright, B. D. (1977). Solving measurement problems with the Rasch model. Journal of Educational Measurement, 97–116. https://doi.org/10.1111/j.1745-3984.1977.tb00031.x
Zeng, L. M., Fryer, L. K., & Zhao, Y. (2023). A comparison of three major instruments used for the assessment of university student experience: Toward a comprehensive and distributed approach. Higher Education Quarterly, 77(1), 27–44. https://doi.org/10.1111/hequ.12363
Zumbo, B. D. (2006). 3 validity: foundational issues and statistical methodology. Handbook of Statistics, 26, 45–79. https://doi.org/10.1016/S0169-7161(06)26003-6
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Mohd Zaidi bin Amiruddin

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.


