Essentials of Preprocessing Data in Improving Logistic Regression Performance Based on Rough Sets Theory: A Case Study of Stunting in West Sumatra, Indonesia

Authors

  • Izzati Rahmi Department of Mathematics, Faculty of Science and Mathematics, Universiti Pendidikan Sultan Idris, 35900 Tanjung Malim, Perak, Malaysia; Department of Mathematics and Data Science, Faculty of Mathematics and Natural Science, Andalas University, 25163 Padang, Indonesia
  • Riswan Efendi Department of Mathematics, Faculty of Science and Mathematics, Universiti Pendidikan Sultan Idris, 35900 Tanjung Malim, Perak, Malaysia; Department of Mathematics, Faculty of Science and Technology, UIN Sultan Syarif Kasim Riau, 28293, Pekanbaru, Indonesia
  • Ferra Yanuar Department of Mathematics and Data Science, Faculty of Mathematics and Natural Science, Andalas University, 25163 Padang, Indonesia
  • Hazmira Yozza Department of Mathematics and Data Science, Faculty of Mathematics and Natural Science, Andalas University, 25163 Padang, Indonesia
  • Muhammad Wahyudi Department of Information Technology Politeknik Caltex Riau, 28265 Pekanbaru, Indonesia
  • S. M. A. Burney Department of Computer Science, Faculty of Science, University of Karachi, 75190 Karachi, Pakistan
  • Erol Eğrioğlu Department of Statistics, Faculty of Arts and Sciences, Giresun University, 28610 Giresun, Turkey

DOI:

https://doi.org/10.37134/jsml.vol13.1.12.2025

Keywords:

logistic regression, the risk factor of stunting, rough set theory, data reduction, inconsistent samples, irrelevant attributes

Abstract

Various studies have considered a logistic regression model for investigating stunting and its determinants. However, some models above fall short of the researcher's expectations, such as the fact that there is no significant variable, making them very challenging to interpret. In this paper,we are interested in handling the problem above by applying data reduction strategies using rough set theory in the preprocessing phase and implementing them for stunting data sets in Solok Regency, West Sumatra Province, Indonesia. There are three types of data reduction: removing inconsistent samples, irrelevant attributes, or both, so three kinds of modified models will be used in this paper, namely Logistic Regression Reduction Rough Set (LR3S) type I, II, and III. The classic and modified models were compared using performance model criteria, which include precision, recall, F1-score, accuracy, ROC curves, and AUC values. It was determined that the stunting data set was unsuitable for the classical logistic model, and the best model was built by removing inconsistent observations (LR3S type I). At a 5% significance level, the best model indicates the factors significantly influencing stunting incidence: exclusive breastfeeding, birth weight, smoking family, immunization, and gender. Stunting classes are not considerably differentiated by factors such as worms, clean water, comorbidities, hygienic latrines, and health assurance in Solok Regency. A priority program and policies for stunting prevention in this regency can be developed by considering the extent of these factors' major influence.

Downloads

Download data is not yet available.

References

Adi S, Krisnana I, Rahmawati PD, Magfiroh U. (2023). Environmental factors that affect the incidence of stunting in underfive children: A literature review. Pediomaternal Nursing Journal, 9(1), 42-44.

Adrizain R, Faridah L, Fauziah N, Berbudi A, Afifah DN, Setiabudi D, Setiabudiawan B. (2024). Factors influencing stunted growth in children: A study in Bandung Regency focusing on a deworming program. Parasite Epidemiology and Control, 26, 1-9.

Agussalim, Zulkifli A, Noor NN, Ansariadi, Stang, Riskiyani S. (2024). Risk factor analysis of stunting in children aged 6-23 months in Tanralili district, Maros regency, Indonesia. National Journal of Community Medicine, 15(7), 559-565.

Andika F, Marniati, Rahmi N, Anwar C, Husna A, Safitri F. (2021). Analysis of stunting incidence factors in toddlers aged 23-59 months in the work area of the Padang Tiji Community Health Center, Pidie Regency, 2020. International Journal of Science, Technology & Management, 2(3), 642-649.

Arsyad R, Sutarto, Carolia N. (2023). Relationship of basic immunization history and history of infection with stunting incidence in toddlers: A literature review. Medical Profession Journal of Lampung, 13(2), 179-181.

Aziz ZA. (2010). An overview of the importance of industrial mathematics. Journal of Science and Mathematics Letters, 2(2), 9-17.

Badriyah L, Syafiq A. (2017). The association between sanitation, hygiene, and stunting in children under two years (An analysis of Indonesia's basic health research, 2013). Makara Journal of Health Research, 21(2), 35-41.

Benvenuto AF, Adnyana IGA, Samodra VM, Azmi F. (2022). The relationship between worm infection and stunting in children in central Lombok regency, West Nusa Tenggara province. Jurnal Aisyah: Jurnal Ilmu Kesehatan, 7(2), 663-668.

Bhowmik KR, Das S. (2019). On selection of an appropriate logistic model to determine the risk factors of childhood stunting in Bangladesh. Maternal & Child Nutrition, 15(1), 1-10.

Bhukya H, Manchala S. (2023). Rough set-based feature selection for prediction of breast cancer. Wireless Personal Communications, 130(3), 2197-2214.

Brahima JJ, Noor NN, Jafar N. (2020). Immunization and distance relationship status on the birth events 1000 HPK stunting work in bone health district Barebbo. Enfermeria Clinica, 30(4), 318-322.

Caballero Y, Alvarez D, Bello R, Garcia M. (2007). Feature selection algorithms using rough set theory, Proceedings of 7th International Conference on Intelligent Systems Design and Applications. Rio de Janeiro, Brazil

Cao H. (2021). The utilization of rough set theory and data reduction based on artificial intelligence in recommendation system. Soft Computing, 25(3), 2153-2164.

Cao S, Xie M, Jia C, Zhang Y, Gong J, Wang B, Qin N, Zhao L, Yu D, Duan X. (2022). Household second-hand smoke exposure and stunted growth among Chinese school-age children. Environmental Technology & Innovation, 27, 1-14.

Çorbacıoglu SK, Aksel G. (2023). Receiver operating characteristic curve analysis in diagnostic accuracy studies: A guide to interpreting the area under the curve value. Turkish Journal of Emergency Medicine, 23(4), 195-198.

Cramer JS. (2002). The Origins of Logistic Regression. Tinbergen Institute Amsterdam.

Efendi R, Mu’at S, Arisandi N, Samsudin NA. (2019). Removing unclassified elements in investigating of financial wellbeing attributes using rough-regression model. Proceedings of the ICSCA '19, February 19-21, 2019, Penang, Malaysia.

Efendi R, Rejab MM, Arbaiy N, Yofi WT, Widyawati SR, Rahmi I, Yozza H. (2024). Improved Rough-Multiple Regression for Unemployment Rate Model in Indonesia. Recent Advances on Soft Computing and Data Mining, 94-104.

Ernawati R, Nurjanah M, Wahyuni T. (2024). The correlation of environmental sanitation with stunting incidents in school-age children. Indonesian Journal of Global Health Research, 6(2), 553-564.

Fatimah, Massi MN, Febriani ADB, Hatta M, Permatasari TAE, Hidayati E, Hamidah, Khumaidi MA, Akaputra R, Turrahmi H, Anggriani RP (2021). Effect of breastfeeding on children's health and its relationship to NRAMP1 expression: a cross-sectional study. Annals of Medicine and Surgery, 71, 1-7.

Hilbe JM. (2016). Practical Guide to Logistic Regression. CRC Press, USA.

Hosmer DW, Lemeshow S. (2000). Applied logistic regression, second edition. John Willey and Sons, USE.

Iliou T, Nikolaos C, Nerantzaki M, Anastassopoulos G. (2015). A novel machine learning data preprocessing method for enhancing classification algorithms performance. Proceeding of 16th International Conference on Engineering Applications of Neural Networks (INNS), September 25-28, 2015, Rhodes Island, Greece.

Jiang Y, Su X, Wang C, Zhang L, Zhang X, Wang L, Cui Y. (2014). Prevalence and risk factors for stunting and severe stunting among children under three years old in mid-western rural areas of China. Child: Care, Health, and Development, 41(1), 45-51.

Kaka-Khan KM, Mahmud H, Ali AA. (2022). Rough set-based feature selection for predicting diabetes using logistic regression with stochastic gradient descent algorithm. UHD Journal of Science and Technology, 6(2), 85-93.

Katoch OR. (2022). Determinants of malnutrition among children: A systematic review. Nutrition, 96, 1-8.

Kementrian Kesehatan Republik Indonesia. (2022). Keputusan Menteri Kesehatan Nomor HK.01.07/MENKES/1928/2022 tentang Pedoman Nasional Pelayanan Kedokteran Tata Laksana Stunting. Kementrian Kesehatan Republik Indonesia, Jakarta.

Kementrian Kesehatan Republik Indonesia. (2024). Panduan Hari Gizi Nasional ke 64 Tahun 2024. Kementrian Kesehatan Republik Indonesia, Jakarta.

Kim K, Pant P, Yamashita EY. (2008). Hit-and-run crashes: Use of rough set analysis with logistic regression to capture critical attributes and determinants. Transportation Research Record Journal of the Transportation Research Board, 2083(1), 114-121.

Li X. (2014). Attribute selection methods in rough set theory. Master's Thesis, San Jose State University, California.

Liu D, Li T, Liang D. (2014). Incorporating logistic regression to decision-theoretic rough sets for classifications, International Journal of Approximate Reasoning, 55(1), 197-210.

Muchlis N, Yusuf RA, Rusydi AR, Mahmud NU, Hikmah N, Qanitha A, Ahsan A. (2023). Cigarette smoke exposure and stunting among under-five children in rural and poor families in Indonesia. Environmental Health Insights, 17, 1-7.

Ohyfer M, Moniaga JV, Yunidwi KR, Setiawan MI. (2017). Logistic regression and growth charts to determine children nutritional and stunting status: A Review. Procedia Computer Science, 116, 232-241.

Osisanwo FY, Akinsola JET, Awodele O, Hinmikaiye JO, Olakanmi O, Akinjobi J. (2017). Supervised machine learning algorithms: classification and comparison. International Journal of Computer Trends and Technology, 48(3), 128-138.

Panggabean ER, Lustiyati ED, Yuningrum H, Trisnowati H. (2023). Family smoking behavior and stunting among children in rural areas of Sleman, Yogyakarta: A case-control study. Public Health and Preventive Medicine Archive, 11(2), 222-232.

Paun R, Bia MB, Shagti I, Gunawan YES, Krisyudhanti E, Dafroyati Y, Mau F. (2021). The relationship between intestinal worm infection and stunting in elementary school children in South Central Timor Regency, East Nusa Tenggara. Proceeding of the 8th International Conference on Public Health, November 17-18, 2021, Solo, Indonesia.

Pawlak Z. (1991). Rough Sets, Theoretical Aspects of Reasoning About Data. Kluwer Academic Publishers, Amsterdam.

Pawlak Z. (1998). Rough set theory and its applications to data analysis. Cybernetics and Systems, 29(7), 661-688.

Permatasari T, Chairunnisa, Djarir H, Herlina L, Fauziah M, Andriyani, Chadirin Y. (2023). The determinants of stunting in the under-five in three municipalities in the special capital region of Jakarta. National Public Health Journal, 18(1), 32-40.

Pessoa ASA, Stephany S. (2014). An innovative approach for attribute reduction in rough set theory. Intelligent Information Management, 6(5), 223-239.

Podungge Y, Yulianingsih E, Porouw HS, Saraswati E, Tompunuh MM, Claudia JG, Zakaria R, Labatjo R. (2021). Determinant factors of stunting in under-five children. Open Access Macedonian Journal of Medical Sciences, 9(B), 1717-1726.

Putri TA, Salsabilla DA, Saputra RK. (2021). The effect of low birth weight on stunting in children under five: a meta-analysis. Journal of Maternal and Child Health, 6(4), 496-506.

Rahhali M, Oughdir L, Lahmadi Y, Khattabi MZE. (2024). The impact of data preprocessing on the quality and effectiveness of e-learning. International Journal of Intelligent Systems and Applications in Engineering, 12 (16), 59-65.

Rahmi I, Efendi R, Samat NA, Yozza H, Syafwan M. (2024). Examining risk factors of anemia in pregnancy using hybrid logistic regression model and rough set theory. Barekeng: Journal of Mathematics and Its Application, 18(1), 537-552.

Rahmi I, Efendi R, Samat NA, Yozza H, Wahyudi M. (2024). The Effects of Data Reduction Using Rough Set Theory on Logistic Regression Model. In: Ghazali, R., Nawi, N.M., Deris, M.M., Abawajy, J.H., Arbaiy, N. (eds) Recent Advances on Soft Computing and Data Mining. SCDM 2024. Lecture Notes in Networks and Systems, Springer.

Rakhmahayu A, Dewi YLR, Murti B. (2019). Logistic regression analysis on the determinants of stunting among children aged 6-24 months in Purworejo regency, Central Java. Journal of Maternal and Child Health, 4(3), 158-169.

Rasyidah N, Efendi R, Nawi NM, Deris MM, Burney SMA. (2023). Cleansing of inconsistent sample in linear regression model based on rough sets theory. Systems and Soft Computing, 5, 1-14.

Resiyanthi NK, Yanti NL. (2021). Relationship between exclusive breastfeeding and basic immunization status related with stunting in toddlers aged 12-24 months. Indonesia Journal of Global Health Research, 3(4), 563-570.

Rissino S, Torres GL. (2009). Rough set theory - fundamental concepts, principals, data extraction, and applications. In Data Mining and Knowledge Discovery in Real Life Applications. Intech Open.

Saenong MT, Sulaeman, Bakhtiar, Purnama J, Kenre I. (2024). The relationship between cigarette smoke exposure and stunting among children in the working area of the Pangkajene Health Center, Sidrap Regency in 2023. Journal Of Nursing Practice, 7(2), 325-334.

Sami O, Elsheikh Y, Almasalha F. (2021). The role of data preprocessing techniques in improving machine learning accuracy for predicting coronary heart disease. International Journal of Advanced Computer Science and Applications, 12(6), 812-820.

Shinsugi C, Mizumoto A. (2022). Associations of nutritional status with full immunization coverage and safe hygiene practices among Thai children aged 12-59 months. Nutrients, 14(1), 1-8.

Sihotang WY, Hulu VT, Samosir FJ, Pane, Hartono, Manalu P, Siagian M, Panjaitan HIL. (2023). Determinants of stunting in children under five: a scoping review. The Indonesian Journal of Nutrition, 12(1), 9-20.

Siswati T. (2019). Risk factors for stunting and severe stunting among under five years children in rural areas in Indonesia. International Journal of Science and Research, 8(11), 1635-1640.

Supadmi S, Laksono AD, Kusumawardani HD, Ashar H, Nursafingi A, Kusrini I, Musoddaq MA. (2024). Factor related to stunting of children under two years with working mothers in Indonesia. Clinical Epidemiology and Global Health, 26, 1-6.

Suwarni L, Selviana, Vidyastuti, Abdullah A, Adi P. (2023). Risk factors for severe stunded among Children aged 2-5 years with stunting in Pontianak City, Indonesia. Global Health Management Journal, 6(2), 81-89.

Thompson AL. (2021). Greater male vulnerability to stunting? Evaluating sex differences in growth, pathways and biocultural mechanisms. Ann Hum Biol, 48(6), 466-473.

Thurstans S, Opondo C, Seal A, Wells J, Khara T, Dolan C, Briend A, Myatt M, Garenne M, Sear R, Kerac M. (2020). Boys are more likely to be undernourished than girls: a systematic review and meta-analysis of sex differences in undernutrition. BMJ Global Health, 5(12), 1-17.

TPPS Provinsi Sumatera Barat. (2024). Penyelenggaraan Percepatan Penurunan Stunting 2023. Pemerintah Daerah Provinsi Sumatera Barat, Padang.

United Nations. (2022). Goal 2: End hunger, achieve food security and improved nutrition and promote sustainable agriculture. United Nations; [Accessed November 11, 2024]. https://sdgs.un.org/goals/goal2

WHO. (2010). Nutrition Landscape Information System (NLIS) Country Profile Indicators:Interpretation Guide. Switzerland, WHO Press.

WHO. (2024). Nutrition Targets 2025: Policy Brief Series. WHO; [Accessed: November 11, 2024]. https://www.who.int/publications /i/item/WHO-NMH-NHD-14.2,

Wijayanti LA, Nurseskasatmata SE. (2022). The relationship between history of birth weight and stunting event in children of 24 - 60 months. International Journal of Health Sciences, 6(6), 9732-9741.

Yaile C, Delia A, Rafael B, Maria G. (2007). Feature Selection Algorithms Using Rough Set Theory. Intelligent Systems Design and Applications Conference, 1-5.

Downloads

Published

2025-06-12

How to Cite

Rahmi, I. ., Efendi, R. ., Yanuar, F. ., Yozza, H. ., Wahyudi, M. ., Burney, S. M. A. ., & Eğrioğlu, E. . (2025). Essentials of Preprocessing Data in Improving Logistic Regression Performance Based on Rough Sets Theory: A Case Study of Stunting in West Sumatra, Indonesia. Journal of Science and Mathematics Letters, 13(1), 124-139. https://doi.org/10.37134/jsml.vol13.1.12.2025

Similar Articles

1-10 of 145

You may also start an advanced similarity search for this article.