Assessing Educational Instrument Validity-Reliability and Student Performance: Rasch Model Insights

Authors

  • Mohd Zaidi bin Amiruddin Universitas Pendidikan Indonesia, Bandung, Indonesia

DOI:

https://doi.org/10.37134/ajatel.vol16.1.2.2026

Keywords:

assessment, instruments, performance, rasch analysis, reliability, validity

Abstract

This study aims to assess the validity-reliability and student performance. A cross-sectional research design with a quantitative method was employed in this study. The participants of this research were 34 students divided into 12 males (35.3%) and 22 females (64.7%) with an age range of 16-17 years. The data collection used a Quizizz with 15 questions online but still under teacher supervision. The results demonstrate moderate reliability and validity per Rasch analysis, with a person reliability score of 0.63 and an item reliability score of 0.46, indicating a need for item refinement. The person separation index (1.31) and item separation index (0.93) show moderate differentiation, while a Cronbach alpha of 0.70 and raw variance explained by measures (41.6%) support instrument validity. The characteristics of the items and persons reveal that most items align well with the model expectations, although some exhibit variability or inconsistencies that need addressing.

Downloads

Download data is not yet available.

References

Abdellatif, H. (2023). Test results with and without blueprinting:Psychometric analysis using the Rasch model. Educación Médica, 24(3), 100802. https://doi.org/10.1016/j.edumed.2023.100802

Abdi, H., & Williams, L. J. (2010). Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2(4), 433–459. https://doi.org/10.1002/wics.101

Ahmed, V., Olanipekun, A., Opoku, A., & Sutrisna, M. (2022). Understanding reliability in research. In Validity and Reliability in Built Environment Research (pp. 3–15). Routledge. https://doi.org/10.1201/9780429243226-2

Akbulut, Y., & Cardak, C. S. (2012). Adaptive educational hypermedia accommodating learning styles: A content analysis of publications from 2000 to 2011. Computers & Education, 58(2), 835–842. https://doi.org/10.1016/j.compedu.2011.10.008

Akkoyunlu, B., & Soylu, M. Y. (2008). A study of student’s perceptions in a blended learning environment based on different learning styles. Journal of Educational Technology & Society, 11(1), 183–193. https://doi.org/10.1016/j.iheduc.2007.12.006

Al-Hawamdeh, B. O. S., Hussen, N., & Abdelrasheed, N. S. G. (2023). Portfolio vs. summative assessment: impacts on EFL learners’ writing complexity, accuracy, and fluency (CAF); self-efficacy; learning anxiety; and autonomy. Language Testing in Asia, 13(1), 12. https://doi.org/10.1186/s40468-023-00225-5

Al-Sagarat, A. Y., Yaghmour, G., & Moxham, L. (2017). Intentions and barriers toward breastfeeding among Jordanian mothers—A cross sectional descriptive study using quantitative method. Women and Birth, 30(4), e152–e157. https://doi.org/10.1016/j.wombi.2016.11.001

Amiruddin, M. Z. Bin, Samsudin, A., Suhandi, A., Kaniawati, I., COŞTU, B., Aminuddin, A. H., & Kuniawan, F. (2023). Validity and Reliability of the Global Warming Instrument: A Pilot Study Using Rasch Model Analysis. Jurnal Pendidikan MIPA, 24(4), 912–922. https://doi.org/10.23960/jpmipa/v24i4.pp912-922

Arifin, Z., & Setiawan, B. (2022). Utilising Gamification for Online Evaluation through Quizizz: Teachers’ Perspectives and Experiences. World Journal on Educational Technology: Current Issues, 14(3), 781–796. https://doi.org/10.18844/wjet.v14i3.7278

Arnold, J. C., Boone, W. J., Kremer, K., & Mayer, J. (2018). Assessment of competencies in scientific inquiry through the application of Rasch measurement techniques. Education Sciences, 8(4), 184. https://doi.org/10.3390/educsci8040184

Bartholomew, D. J., Knott, M., & Moustaki, I. (2011). Latent variable models and factor analysis: A unified approach. John Wiley & Sons. https://doi.org/10.1002/9781119970583

Behi, R., & Nolan, M. (1995). Reliability: consistency and accuracy in measurement. British Journal of Nursing, 4(8), 472–475. https://doi.org/10.12968/bjon.1995.4.8.472

Bennett, R. E. (2011). Formative assessment: A critical review. Assessment in Education: Principles, Policy & Practice, 18(1), 5–25. https://doi.org/10.1080/0969594X.2010.513678

Bienkowski, M., Feng, M., & Means, B. (2012). Enhancing Teaching and Learning through Educational Data Mining and Learning Analytics: An Issue Brief. Office of Educational Technology, US Department of Education.

Boone, W. J., & Noltemeyer, A. (2017). Rasch analysis: A primer for school psychology researchers and practitioners. Cogent Education, 4(1), 1416898. https://doi.org/10.1080/2331186X.2017.1416898

Boone, W. J., Staver, J. R., Yale, M. S., Boone, W. J., Staver, J. R., & Yale, M. S. (2014). Wright maps: First steps. Rasch Analysis in the Human Sciences, 111–136. https://doi.org/10.1007/978-94-007-6857-4_6

Borsboom, D., Mellenbergh, G. J., & Van Heerden, J. (2004). The concept of validity. Psychological Review, 111(4), 1061. https://doi.org/10.1037/0033-295X.111.4.1061

Bradburn, N. M., Sudman, S., & Wansink, B. (2004). Asking questions: the definitive guide to questionnaire design--for market research, political polls, and social and health questionnaires. John Wiley & Sons.

Brochado, A. (2009). Comparing alternative instruments to measure service quality in higher education. Quality Assurance in Education, 17(2), 174–190. https://doi.org/10.1108/09684880910951381

Brookhart, S. M. (2011). Educational assessment knowledge and skills for teachers. Educational Measurement: Issues and Practice, 30(1), 3–12. https://doi.org/10.1111/j.1745-3992.2010.00195.x

Care, E., Kim, H., Vista, A., & Anderson, K. (2018). Education System Alignment for 21st Century Skills: Focus on Assessment. Center for Universal Education at The Brookings Institution.

Chalmers, D. (2007). A review of Australian and international quality systems and indicators of learning and teaching. Carrick Institute for Learning and Teaching in Higher Education, 1(2), 1–122.

Chan, S.-W., Looi, C.-K., & Sumintono, B. (2021). Assessing computational thinking abilities among Singapore secondary students: a Rasch model measurement analysis. Journal of Computers in Education, 8, 213–236. https://doi.org/10.1007/s40692-020-00177-2

Cohen, D., & Sasson, I. (2016). Online quizzes in a virtual learning environment as a tool for formative assessment. Journal of Technology and Science Education (JOTSE), 6(3), 188–208.

Cohen, L., Manion, L., & Morrison, K. (2017). Validity and reliability. In Research methods in education (pp. 245–284). Routledge. https://doi.org/10.4324/9781315456539-14

Colville, G., Darkins, J., Hesketh, J., Bennett, V., Alcock, J., & Noyes, J. (2009). The impact on parents of a child’s admission to intensive care: Integration of qualitative findings from a cross-sectional study. Intensive and Critical Care Nursing, 25(2), 72–79. https://doi.org/10.1016/j.iccn.2008.10.002

Cook, D. A., & Beckman, T. J. (2006). Current concepts in validity and reliability for psychometric instruments: theory and application. The American Journal of Medicine, 119(2), 166-e7. https://doi.org/10.1016/j.amjmed.2005.10.036

Creswell, J. W., & Creswell, J. D. (2017). Research design: Qualitative, quantitative, and mixed methods approaches. Sage publications.

Darman, D. R., Suhandi, A., Kaniawati, I., Samsudin, A., & Wibowo, F. C. (2024). Development and Validation of Scientific Inquiry Literacy Instrument (SILI) Using Rasch Measurement Model. Education Sciences, 14(3), 322. https://doi.org/10.3390/educsci14030322

De Castella, K., & Byrne, D. (2015). My intelligence may be more malleable than yours: The revised implicit theories of intelligence (self-theory) scale is a better predictor of achievement, motivation, and student disengagement. European Journal of Psychology of Education, 30, 245–267. https://doi.org/10.1007/s10212-015-0244-y

Dunn, K. E., Airola, D. T., Lo, W.-J., & Garrison, M. (2013). What teachers think about what they can do with data: Development and validation of the data driven decision-making efficacy and anxiety inventory. Contemporary Educational Psychology, 38(1), 87–98. https://doi.org/10.1016/j.cedpsych.2012.11.002

Feldman, J. M., & Lynch, J. G. (1988). Self-generated validity and other effects of measuremen

https://doi.org/10.1037/0021-9010.73.3.421on belief, attitude, intention, and behavior. Journal of Applied Psychology, 73(3), 421.

Fink, A., & Litwin, M. S. (1995). How to measure survey reliability and validity (Vol. 7). Sage. https://doi.org/10.4135/9781483348957

Floyd, F. J., & Widaman, K. F. (1995). Factor analysis in the development and refinement of clinical assessment instruments. Psychological Assessment, 7(3), 286. https://doi.org/10.1037/1040-3590.7.3.286

Galy, E., Downey, C., & Johnson, J. (2011). The effect of using e-learning tools in online and campus-based classrooms on student performance. Journal of Information Technology Education: Research, 10(1), 209–230. https://doi.org/10.28945/1503

Garrido, L. E., Abad, F. J., & Ponsoda, V. (2013). A new look at Horn’s parallel analysis with ordinal variables. Psychological Methods, 18(4), 454. https://doi.org/10.1037/a0030005

Gikandi, J. W., Morrow, D., & Davis, N. E. (2011). Online formative assessment in higher education: A review of the literature. Computers & Education, 57(4), 2333–2351. https://doi.org/10.1016/j.compedu.2011.06.004

Golafshani, N. (2003). Understanding reliability and validity in qualitative research. The Qualitative Report, 8(4), 597–607.

Green, S. B., Levy, R., Thompson, M. S., Lu, M., & Lo, W.-J. (2012). A proposed solution to the problem with using completely random data to assess the number of factors with parallel analysis. Educational and Psychological Measurement, 72(3), 357–374. https://doi.org/10.1177/0013164411422252

Groves, R. M. (1987). Research on survey data quality. The Public Opinion Quarterly, 51, S156–S172. https://doi.org/10.1086/269077

Henseler, J., Ringle, C. M., & Sarstedt, M. (2015). A new criterion for assessing discriminant validity in variance-based structural equation modeling. Journal of the Academy of Marketing Science, 43, 115–135. https://doi.org/10.1007/s11747-014-0403-8

Hitt, D. H., & Tucker, P. D. (2016). Systematic review of key leader practices found to influence student achievement: A unified framework. Review of Educational Research, 86(2), 531–569. https://doi.org/10.3102/0034654315614911

Hora, M. T., Bouwma-Gearhart, J., & Park, H. J. (2017). Data driven decision-making in the era of accountability: Fostering faculty data cultures for learning. The Review of Higher Education, 40(3), 391–426. https://doi.org/10.1353/rhe.2017.0013

Jackson, D. A. (1993). Stopping rules in principal components analysis: a comparison of heuristical and statistical approaches. Ecology, 74(8), 2204–2214. https://doi.org/10.2307/1939574

Johnson, R. B., & Christensen, L. (2019). Educational research: Quantitative, qualitative, and mixeapproaches. Sage publications.

Jonsson, A., & Svingby, G. (2007). The use of scoring rubrics: Reliability, validity and educational consequences. Educational Research Review, 2(2), 130–144. https://doi.org/10.1016/j.edurev.2007.05.002

Karlin, O., & Karlin, S. (2018). Making Better Tests with the Rasch Measurement Model. InSight: A Journal of Scholarly Teaching, 13, 76–100. https://doi.org/10.46504/14201805ka

Kimberlin, C. L., & Winterstein, A. G. (2008). Validity and reliability of measurement instruments used in research. American Journal of Health-System Pharmacy, 65(23), 2276–2284. https://doi.org/10.2146/ajhp070364

Kreijns, K., Bijker, M., & Weidlich, J. (2020). A Rasch analysis approach to the development and validation of a social presence measure. Rasch Measurement: Applications in Quantitative Educational Research, 197–221. https://doi.org/10.1007/978-981-15-1800-3_11

Kulasegaram, K., & Rangachari, P. K. (2018). Beyond “formative”: assessments to enrich student learning. Advances in Physiology Education, 42(1), 5–14. https://doi.org/10.1152/advan.00122.2017

Kyriakides, L., Christoforou, C., & Charalambous, C. Y. (2013). What matters for student learning outcomes: A meta-analysis of studies exploring factors of effective teaching. Teaching and Teacher Education, 36, 143–152. https://doi.org/10.1016/j.tate.2013.07.010

Lewis, B. R., Templeton, G. F., & Byrd, T. A. (2005). A methodology for construct development in MIS research. European Journal of Information Systems, 14(4), 388–400. https://doi.org/10.1057/palgrave.ejis.3000552

Lim, T. M., & Yunus, M. M. (2021). Teachers’ perception towards the use of Quizizz in the teaching and learning of English: A systematic review. Sustainability, 13(11), 6436. https://doi.org/10.3390/su13116436

Martínez-Caro, E., Cegarra-Navarro, J. G., & Cepeda-Carrión, G. (2015). An application of the performance-evaluation model for e-learning quality in higher education. Total Quality Management & Business Excellence, 26(5–6), 632–647. https://doi.org/10.1080/14783363.2013.867607

Massof, R. W. (2005). Application of stochastic measurement models to visual function rating scale questionnaires. Ophthalmic Epidemiology, 12(2), 103–124. https://doi.org/10.1080/09286580590932789

Massof, R. W. (2011). Understanding Rasch and item response theory models: applications to the estimation and validation of interval latent trait measures from responses to rating scale questionnaires. Ophthalmic Epidemiology, 18(1), 1–19. https://doi.org/10.3109/09286586.2010.545501

Massof, R. W., & Rubin, G. S. (2001). Visual function assessment questionnaires. Survey of Ophthalmology, 45(6), 531–548. https://doi.org/10.1016/S0039-6257(01)00194-1

Maydeu-Olivares, A. (2017). Assessing the size of model misfit in structural equation models. Psychometrika, 82(3), 533–558. https://doi.org/10.1007/s11336-016-9552-7

Neumann, I., Neumann, K., & Nehm, R. (2011). Evaluating instrument quality in science education: Rasch‐baseanalyses of a nature of science test. International Journal of Science Education, 33(10), 1373–1405. https://doi.org/10.1080/09500693.2010.511297

Park, M. S., Kang, K. J., Jang, S. J., Lee, J. Y., & Chang, S. J. (2018). Evaluating test-retest reliability in patient-reported outcome measures for older people: A systematic review. International Journal of Nursing Studies, 79, 58–69. https://doi.org/10.1016/j.ijnurstu.2017.11.003

Pellegrino, J. W., & Quellmalz, E. S. (2010). Perspectives on the integration of technology and assessment. Journal of Research on Technology in Education, 43(2), 119–134. https://doi.org/10.1080/15391523.2010.10782565

Pianta, R. C., Hamre, B. K., & Allen, J. P. (2012). Teacher-student relationships and engagement: Conceptualizing, measuring, and improving the capacity of classroom interactions. In Handbook of research on student engagement (pp. 365–386). Springer. https://doi.org/10.1007/978-1-4614-2018-7_17

Planinic, M., Boone, W. J., Susac, A., & Ivanjek, L. (2019). Rasch analysis in physics education research: Why measurement matters. Physical Review Physics Education Research, 15(2), 20111. https://doi.org/10.1103/PhysRevPhysEducRes.15.020111

Radhakrishna, R. B. (2007). Tips for developing and testing questionnaires/instruments. The Journal of Extension, 45(1), 25.

Reise, S. P., Moore, T. M., & Haviland, M. G. (2010). Bifactor models and rotations: Exploring the extent to which multidimensional data yield univocal scale scores. Journal of Personality Assessment, 92(6), 544–559. https://doi.org/10.1080/00223891.2010.496477

Reise, S. P., Waller, N. G., & Comrey, A. L. (2000). Factor analysis and scale revision. Psychological Assessment, 12(3), 287. https://doi.org/10.1037/1040-3590.12.3.287

Shen, J., Cooley, V. E., Ma, X., Reeves, P. L., Burt, W. L., Rainey, J. M., & Yuan, W. (2012). Data-informed decision making on high-impact strategies: Developing and validating an instrument for principals. The Journal of Experimental Education, 80(1), 1–25. https://doi.org/10.1080/00220973.2010.550338

Soeharto, S., & Csapó, B. (2022). Assessing Indonesian student inductive reasoning: Rasch analysis. Thinking Skills and Creativity, 46, 101132. https://doi.org/10.1016/j.tsc.2022.101132

Spooren, P., Brockx, B., & Mortelmans, D. (2013). On the validity of student evaluation of teaching: The state of the art. Review of Educational Research, 83(4), 598–642. https://doi.org/10.3102/0034654313496870

Spooren, P., Mortelmans, D., & Denekens, J. (2007). Student evaluation of teaching quality in higher education: development of an instrument based on 10 Likert‐scales. Assessment & Evaluation in Higher Education, 32(6), 667–679. https://doi.org/10.1080/02602930601117191

Stiggins, R. (2010). Essential formative assessment competencies for teachers and school leaders. In Handbook of formative assessment (pp. 233–250). Routledge.

Stigler, J. W., & Hiebert, J. (2009). The teaching gap: Best ideas from the world’s teachers for improving education in the classroom. Simon and Schuster.

Sürücü, L., & Maslakci, A. (2020). Validity and reliability in quantitative research. Business & Management Studies: An International Journal, 8(3), 2694–2726. https://doi.org/10.15295/bmij.v8i3.1540

Suskie, L. (2018). Assessing student learning: A common sense guide. John Wiley & Sons.

Talib, A. M., Alomary, F. O., & Alwadi, H. F. (2018). Assessment of student performance for course examination using Rasch measurement model: A case study of information technology fundamentals course. Education Research International, 2018(1), 8719012. https://doi.org/10.1155/2018/8719012

Tennant, A., & Conaghan, P. G. (2007). The Rasch measurement model in rheumatology: what is it and why use it? When should it be applied, and what should one look for in a Rasch paper? Arthritis Care & Research, 57(8), 1358–1362. https://doi.org/10.1002/art.23108

Törmäkangas, K. (2011). Advantages of the Rasch measurement model in analysing educational tests: an applicator’s reflection. Educational Research and Evaluation, 17(5), 307–320. https://doi.org/10.1080/13803611.2011.630562

Van Iddekinge, C. H., Roth, P. L., Raymark, P. H., & Odle-Dusseau, H. N. (2012). The criterion-related validity of integrity tests: An updated meta-analysis. Journal of Applied Psychology, 97(3), 499. https://doi.org/10.1037/a0021196

Vaske, J. J., Beaman, J., & Sponarski, C. C. (2017). Rethinking internal consistency in Cronbach’s alpha. Leisure Sciences, 39(2), 163–173. https://doi.org/10.1080/01490400.2015.1127189

Viladrich, C., Angulo-Brunet, A., & Doval, E. (2017). A journey around alpha and omega to estimate internal consistency reliability. Anales de Psicología, 33(3), 755–782. https://doi.org/10.6018/analesps.33.3.268401

Wiliam, D., Lee, C., Harrison, C., & Black, P. (2004). Teachers developing assessment for learning: Impact on student achievement. Assessment in Education: Principles, Policy & Practice, 11(1), 49–65. https://doi.org/10.1080/0969594042000208994

Wright, B. D. (1977). Solving measurement problems with the Rasch model. Journal of Educational Measurement, 97–116. https://doi.org/10.1111/j.1745-3984.1977.tb00031.x

Zeng, L. M., Fryer, L. K., & Zhao, Y. (2023). A comparison of three major instruments used for the assessment of university student experience: Toward a comprehensive and distributed approach. Higher Education Quarterly, 77(1), 27–44. https://doi.org/10.1111/hequ.12363

Zumbo, B. D. (2006). 3 validity: foundational issues and statistical methodology. Handbook of Statistics, 26, 45–79. https://doi.org/10.1016/S0169-7161(06)26003-6

Downloads

Published

2026-06-03

How to Cite

Amiruddin, M. Z. (2026). Assessing Educational Instrument Validity-Reliability and Student Performance: Rasch Model Insights. Asian Journal of Assessment in Teaching and Learning, 16(1), 12-27. https://doi.org/10.37134/ajatel.vol16.1.2.2026