SEQUENTIAL DATA PREPROCESSING APPROACH FOR ENHANCED MATERNAL HEALTH RISK CLASSIFICATION PERFORMANCE

Authors

  • Ridha Adjie Eryadi Digital Technology University
  • Wildan Hanif Hafidudin Digital Technology University

DOI:

https://doi.org/10.36595/misi.v9i1.1928

Keywords:

Extra Trees Classifier, Interquartile Range, Local Outlier Factor, maternal health risk prediction, sequential outlier detection, SMOTE

Abstract

Maternal mortality is still a major health issue worldwide, and it, along with other reasons, has been leading to predictions that need risk-assessing systems to be improved. The current study performed the sequential outlier detection combining Interquartile Range followed by Local Outlier Factor methods on six machine learning algorithms using the UCI Maternal Health Risk dataset. The comprehensive preprocessing pipeline included the removal of duplicates, application of SMOTE for balancing, followed by Min-Max normalization and detection of outliers in a sequence. The performance of the model was evaluated through holdout validation and 10-fold cross-validation with statistical validation through Wilcoxon signed-rank tests and Cohen's d effect sizes. The Extra Trees Classifier resulted in a 98.34% accuracy rate, which is higher than that in previous studies. The distance-based methods showed the highest sensitivity, with KNN gaining 8.35% while tree-based ensembles were consistent with the accuracy gains. The statistical validation proved that there was a great extent of practical significance with a large effect size of more than 1.0 for the top performers, thereby establishing evidence-based guidelines for the application of sequential preprocessing in maternal health risk prediction systems.

Downloads

Download data is not yet available.

References

[1] World Health Organization, "Maternal mortality: Key facts," WHO, 2020.

[2] Kementerian Kesehatan Republik Indonesia, "Profil kesehatan Indonesia tahun 2020," Kemenkes RI, 2021.

[3] Badan Perencanaan Pembangunan Nasional, "Rencana Pembangunan Jangka Menengah Nasional (RPJMN) 2020-2024," Bappenas, 2020.

[4] Kementerian Kesehatan Republik Indonesia, "Laporan kinerja kementerian kesehatan tahun 2021," Kemenkes RI, 2022.

[5] A. O. Khadidos, F. Saleem, S. Selvarajan, Z. Ullah, and A. O. Khadidos, "Ensemble machine learning framework for predicting maternal health risk during pregnancy," Scientific Reports, vol. 14, no. 1, p. 21483, Sep. 2024, doi: 10.1038/s41598-024-71934-x.

[6] T. R. Noviandy, S. I. Nainggolan, R. Raihan, I. Firmansyah, and R. Idroes, "Maternal health risk detection using Light Gradient Boosting Machine approach," Infolitika Journal of Data Science, vol. 1, no. 2, pp. 48-55, Dec. 2023, doi: 10.60084/ijds.v1i2.123.

[7] L. Pawar, J. Malhotra, A. Sharma, D. Arora, and D. Vaidya, "A robust machine learning predictive model for maternal health risk," in 2022 3rd International Conference on Electronics and Sustainable Communication Systems (ICESC), Aug. 2022, pp. 882-888, doi: 10.1109/icesc54411.2022.9885515.

[8] A. Raza, H. U. R. Siddiqui, K. Munir, M. Almutairi, F. Rustam, and I. Ashraf, "Ensemble learning-based feature engineering to analyze maternal health during pregnancy and health risk prediction," PLoS ONE, vol. 17, no. 11, p. e0276525, Nov. 2022, doi: 10.1371/journal.pone.0276525.

[9] A. Jarmakovica, "Machine learning-based strategies for improving healthcare data quality: an evaluation of accuracy, completeness, and reusability," Frontiers in Artificial Intelligence, vol. 8, p. 1621514, Jul. 2025, doi: 10.3389/frai.2025.1621514.

[10] D. P. Alamsyah, Y. Ramdhani, and L. Susanti, "Maternal Health Risk Classification: Random Forest and Evolutionary Algorithms," in 2023 International Conference on Information Security and Systems (ICISS), Sep. 2023, pp. 1-6, doi: 10.1109/iciss59129.2023.10292044.

[11] M. N. Raihen and S. Akter, "Comparative assessment of several effective machine learning classification methods for maternal health risk," Computational Journal of Mathematical and Statistical Sciences, vol. 3, no. 1, pp. 161-176, Feb. 2024, doi: 10.21608/cjmss.2024.259490.1036.

[12] H. B. Mutlu, F. Durmaz, N. Yücel, E. Cengil, and M. Yildirim, "Prediction of Maternal Health Risk with Traditional Machine Learning Methods," NATURENGS MTU Journal of Engineering and Natural Sciences, Jun. 2023, doi: 10.46572/naturengs.1293185.

[13] T. O. Togunwa, A. O. Babatunde, and K.-U.-R. Abdullah, "Deep hybrid model for maternal health risk classification in pregnancy: synergy of ANN and random forest," Frontiers in Artificial Intelligence, vol. 6, p. 1213436, Jul. 2023, doi: 10.3389/frai.2023.1213436.

[14] M. Rahman, R. Noor, S. Mallik, N. Santa, S. Deb, and A. Pathak, "Classification of health risk levels for pregnant women using support vector machine (SVM) algorithm," IOSR Journal of Computer Engineering, vol. 26, no. 3, pp. 7-17, 2024, doi: 10.9790/0661-2603010717.

[15] L. Jamel et al., "Improving prediction of maternal health risks using PCA features and TreeNet model," PeerJ Computer Science, vol. 10, p. e1982, Apr. 2024, doi: 10.7717/peerj-cs.1982.

[16] H. P. Vinutha, B. Poornima, and B. M. Sagar, "Detection of Outliers Using Interquartile Range Technique from Intrusion Dataset," in Advances in Intelligent Systems and Computing, 2018, pp. 511-518, doi: 10.1007/978-981-10-7563-6_53.

[17] Ch. S. K. Dash, A. K. Behera, S. Dehuri, and A. Ghosh, "An outliers detection and elimination framework in classification task of data mining," Decision Analytics Journal, vol. 6, p. 100164, Jan. 2023, doi: 10.1016/j.dajour.2023.100164.

[18] O. Alghushairy, R. Alsini, T. Soule, and X. Ma, "A review of local outlier factor algorithms for outlier detection in big data streams," Big Data and Cognitive Computing, vol. 5, no. 1, p. 1, Dec. 2020, doi: 10.3390/bdcc5010001.

[19] E. F. Agyemang, "Anomaly detection using unsupervised machine learning algorithms: A simulation study," Scientific African, vol. 26, p. e02386, Sep. 2024, doi: 10.1016/j.sciaf.2024.e02386.

[20] A. H. Abuzaid, "Identifying density-based local outliers in medical multivariate circular data," Statistics in Medicine, vol. 39, no. 21, pp. 2793-2798, May 2020, doi: 10.1002/sim.8576.

[21] D. Elreedy and A. F. Atiya, "A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for handling class imbalance," Information Sciences, vol. 505, pp. 32-64, Jul. 2019, doi: 10.1016/j.ins.2019.07.070.

[22] S. Matharaarachchi, M. Domaratzki, and S. Muthukumarana, "Enhancing SMOTE for imbalanced data with abnormal minority instances," Machine Learning With Applications, vol. 18, p. 100597, Oct. 2024, doi: 10.1016/j.mlwa.2024.100597.

[23] J. Hansen, S. Ahern, and A. Earnest, "Evaluations of statistical methods for outlier detection when benchmarking in clinical registries: a systematic review," BMJ Open, vol. 13, no. 7, p. e069130, Jul. 2023, doi: 10.1136/bmjopen-2022-069130.

[24] M. Ahmed, "Maternal health risk," UC Irvine Machine Learning Repository, Jan. 2020, doi: 10.24432/c5dp5d.

[25] D. Thakur, T. Gera, V. Bhardwaj, A. A. AlZubi, F. Ali, and J. Singh, "An enhanced diabetes prediction amidst COVID-19 using ensemble models," Frontiers in Public Health, vol. 11, p. 1331517, Dec. 2023, doi: 10.3389/fpubh.2023.1331517.

[26] U. J. Nzenwata, E. Edwin, E. A. Chukwu, D. Osilaja, J. O. Hinmikaiye, and C. Enyinnah, "Extra trees model for heart disease prediction," Journal of Data Analysis and Information Processing, vol. 13, no. 2, pp. 125-139, Jan. 2025, doi: 10.4236/jdaip.2025.132008.

[27] J. Lötsch and B. Mayer, "A biomedical case study showing that tuning random forests can fundamentally change the interpretation of supervised data structure exploration aimed at knowledge discovery," BioMedInformatics, vol. 2, no. 4, pp. 544-552, Oct. 2022, doi: 10.3390/biomedinformatics2040034.

[28] M. Feng, X. Wang, Z. Zhao, C. Jiang, J. Xiong, and N. Zhang, "Enhanced heart attack prediction using eXtreme Gradient boosting," Journal of Theory and Practice of Engineering Science, vol. 4, no. 4, pp. 9-16, Apr. 2024, doi: 10.53469/jtpes.2024.04(04).02.

[29] Rizka Dahlia, Lady Agustin Fitriana, and Syarah Seimahuira, “ANALISIS ALGORITMA GRADIENT BOOSTING DALAM PENGARUH MASYARAKAT MEMILIH RUMAH SEWA,” misi, vol. 8, no. 1, pp. 35–44, Jan. 2025, doi: 10.36595/misi.v8i1.1356.

[30] M. A. Jabbar, B. L. Deekshatulu, and P. Chandra, "Classification of heart disease using K-nearest neighbor and genetic algorithm," arXiv preprint arXiv:1508.02061, May 2015, doi: 10.48550/arxiv.1508.02061.

[31] G. P. A. Brahmantha, E. Utami, and A. Yaqin, “KLASIFIKASI GENRE ANIME BERDASARKAN SINOPSIS MENGGUNAKAN ALGORITMA K-NEAREST NEIGHBORS”, misi, vol. 7, no. 1, pp. 15–24, Feb. 2024.

[32] M. Iwagami et al., "Comparison of machine-learning and logistic regression models for prediction of 30-day unplanned readmission in electronic health records: A development and validation study," PLOS Digital Health, vol. 3, no. 8, p. e0000578, Aug. 2024, doi: 10.1371/journal.pdig.0000578.

[33] I. A. Zriqat, A. M. Altamimi, and M. Azzeh, "A comparative study for predicting heart diseases using data mining classification methods," arXiv preprint arXiv:1704.02799, Apr. 2017, doi: 10.48550/arxiv.1704.02799.

[34] M. Sokolova and G. Lapalme, "A systematic analysis of performance measures for classification tasks," Information Processing & Management, vol. 45, no. 4, pp. 427-437, May 2009, doi: 10.1016/j.ipm.2009.03.002.

[35] M. Grandini, E. Bagli, and G. Visani, "Metrics for Multi-Class Classification: an Overview," arXiv preprint arXiv:2008.05756, Aug. 2020, doi: 10.48550/arxiv.2008.05756.

[36] J. Demšar, "Statistical comparisons of classifiers over multiple data sets," Journal of Machine Learning Research, vol. 7, pp. 1-30, 2006.

[37] J. Cohen, Statistical Power Analysis for the Behavioral Sciences, 2nd ed. Hillsdale, NJ, USA: Lawrence Erlbaum Associates, 1988.

[38] G. M. Sullivan and R. Feinn, "Using effect size—or why the P value is not enough," Journal of Graduate Medical Education, vol. 4, no. 3, pp. 279-282, Sep. 2012, doi: 10.4300/jgme-d-12-00156.1.

Downloads

Published

04-02-2026

How to Cite

Adjie Eryadi, R., & Wildan Hanif Hafidudin. (2026). SEQUENTIAL DATA PREPROCESSING APPROACH FOR ENHANCED MATERNAL HEALTH RISK CLASSIFICATION PERFORMANCE. Jurnal Manajemen Informatika Dan Sistem Informasi, 9(1), 140–148. https://doi.org/10.36595/misi.v9i1.1928