Machine Learning for Gestational Diabetes Mellitus: Comparing Random Forest, K-Nearest Neighbor and Naive Bayes for Effective Risk Classification

Authors

  • Ulan Juniarti Universitas Harapan Bangsa Author
  • Retno Agus Setiawan Universitas Harapan Bangsa Author
  • Rosyid Ridlo Al-Hakim Universitas Harapan Bangsa Author

Keywords:

Machine Learning , Gestational Diabetes Mellitus, Classification

Abstract

Gestational diabetes mellitus (GDM) is a complication of pregnancy characterized by impaired glucose tolerance arising or initially identified during pregnancy. This condition requires machine learning-based classification methods that can identify risks early, as conventional methods such as screening often cause delays in risk identification and lack accuracy due to variations in maternal health conditions. Machine learning offers a solution by providing faster and more accurate classification of GDM, due to its ability to quickly process large data and analyze data involving many variables. This study explores the use of Random Forest, K-Nearest Neighbors (KNN), and Naive Bayes algorithms for GDM risk classification to determine the most effective model. Using a dataset containing 1012 samples from the Kurdistan region, the researchers performed data pre-processing, including data cleaning, data balancing using SMOTE, and normalization, followed by model training and evaluation based on accuracy, AUC, sensitivity, and specificity metrics. The results showed that Random Forest achieved the highest accuracy of 86.43%, AUC of 93.78%, sensitivity of 89.29%, and specificity of 83.57. Following that, KNN had an accuracy of 83.93%, AUC of 83.93%, sensitivity of 88.57%, and specificity of 79.29%. Lastly, Naive Bayes reached an accuracy of 76.79%. Based on these results, Random Forest is the best performing algorithm for effective GDM risk classification. This study emphasizes the potential of machine learning to enhance the speed and accuracy of early GDM risk prediction, ultimately contributing to better health outcomes for both mothers and their children.

References

1. Moon JH, Jang HC. Gestational Diabetes Mellitus: Diagnostic Approaches and Maternal-Offspring Complications. Diabetes & Metabolism Journal [Internet]. 2022 Jan 31;46(1):3–14. Available from: http://e-dmj.org/journal/view.php?doi=10.4093/dmj.2021.0335

2. Choudhury AA, Devi Rajeswari V. Gestational diabetes mellitus - A metabolic and reproductive disorder. Biomedicine & Pharmacotherapy [Internet]. 2021 Nov;143:112183. Available from: https://linkinghub.elsevier.com/retrieve/pii/S0753332221009677

3. Wang H, Li N, Chivese T, Werfalli M, Sun H, Yuen L, et al. IDF Diabetes Atlas: Estimation of Global and Regional Gestational Diabetes Mellitus Prevalence for 2021 by International Association of Diabetes in Pregnancy Study Group’s Criteria. Diabetes Research and Clinical Practice [Internet]. 2022 Jan;183:109050. Available from: https://linkinghub.elsevier.com/retrieve/pii/S0168822721004095

4. Sumarah S, Anies A, Rahfiludin MZ, Sulistiyani S. Factors related to HbA1c in the first trimester of pregnancy. International Journal of Public Health Science (IJPHS) [Internet]. 2023 Dec 1;12(4):1491. Available from: https://ijphs.iaescore.com/index.php/IJPHS/article/view/23189

5. Purwono P, Wirasto A, Setiawan RA, Triwibowo DN, Zuhrufillah I, Sumantri RBB, et al. Analysis of the relationship between fetal health prediction features with machine learning feyn qlattice regression model. In 2023. p. 020183. Available from: http://aip.scitation.org/doi/abs/10.1063/5.0120255

6. Muche AA, Olayemi OO, Gete YK. Gestational diabetes mellitus increased the risk of adverse neonatal outcomes: A prospective cohort study in Northwest Ethiopia. Midwifery [Internet]. 2020 Aug;87:102713. Available from: https://linkinghub.elsevier.com/retrieve/pii/S0266613820300863

7. Diaz-Santana M V., O’Brien KM, Park Y-MM, Sandler DP, Weinberg CR. Persistence of Risk for Type 2 Diabetes After Gestational Diabetes Mellitus. Diabetes Care [Internet]. 2022 Apr 1;45(4):864–70. Available from: https://diabetesjournals.org/care/article/45/4/864/141055/Persistence-of-Risk-for-Type-2-Diabetes-After

8. Ortega-Contreras B, Armella A, Appel J, Mennickent D, Araya J, González M, et al. Pathophysiological Role of Genetic Factors Associated With Gestational Diabetes Mellitus. Frontiers in Physiology [Internet]. 2022 Apr 4;13. Available from: https://www.frontiersin.org/articles/10.3389/fphys.2022.769924/full

9. Song X, Shu J, Zhang S, Chen L, Diao J, Li J, et al. Pre-Pregnancy Body Mass Index and Risk of Macrosomia and Large for Gestational Age Births with Gestational Diabetes Mellitus as a Mediator: A Prospective Cohort Study in Central China. Nutrients [Internet]. 2022 Mar 3;14(5):1072. Available from: https://www.mdpi.com/2072-6643/14/5/1072

10. McIntyre HD, Kapur A, Divakar H, Hod M. Gestational Diabetes Mellitus—Innovative Approach to Prediction, Diagnosis, Management, and Prevention of Future NCD—Mother and Offspring. Frontiers in Endocrinology [Internet]. 2020 Dec 3;11. Available from: https://www.frontiersin.org/articles/10.3389/fendo.2020.614533/full

11. Artzi NS, Shilo S, Hadar E, Rossman H, Barbash-Hazan S, Ben-Haroush A, et al. Prediction of gestational diabetes based on nationwide electronic health records. Nature Medicine [Internet]. 2020 Jan 13;26(1):71–6. Available from: https://www.nature.com/articles/s41591-019-0724-8

12. Snyder BM, Baer RJ, Oltman SP, Robinson JG, Breheny PJ, Saftlas AF, et al. Early pregnancy prediction of gestational diabetes mellitus risk using prenatal screening biomarkers in nulliparous women. Diabetes Research and Clinical Practice [Internet]. 2020 May; 163:108139. Available from: https://linkinghub.elsevier.com/retrieve/pii/S0168822720303892

13. Zhang Z, Yang L, Han W, Wu Y, Zhang L, Gao C, et al. Machine Learning Prediction Models for Gestational Diabetes Mellitus: Meta-analysis. Journal of Medical Internet Research [Internet]. 2022 Mar 16;24(3):e26634. Available from: https://www.jmir.org/2022/3/e26634

14. A. A. Permana, R. Darmawan and RRA-H. Artificial Intelligence Marketing [Internet]. Available from: https://www.researchgate.net/publication/373043823

15. Goecks J, Jalili V, Heiser LM, Gray JW. How Machine Learning Will Transform Biomedicine. Cell [Internet]. 2020 Apr;181(1):92–101. Available from: https://linkinghub.elsevier.com/retrieve/pii/S0092867420302841

16. Xing J, Dong K, Liu X, Ma J, Yuan E, Zhang L, et al. Enhancing gestational diabetes mellitus risk assessment and treatment through GDMPredictor: a machine learning approach. Journal of Endocrinological Investigation [Internet]. 2024 Mar 9;47(9):2351–60. Available from: https://link.springer.com/10.1007/s40618-024-02328-z

17. Ilari L, Piersanti A, Göbl C, Burattini L, Kautzky-Willer A, Tura A, et al. Unraveling the Factors Determining Development of Type 2 Diabetes in Women With a History of Gestational Diabetes Mellitus Through Machine-Learning Techniques. Frontiers in Physiology [Internet]. 2022 Feb 17;13. Available from: https://www.frontiersin.org/articles/10.3389/fphys.2022.789219/full

18. Rasool Jader, Sadegh Aminifar. An Intelligent Gestational Diabetes Mellitus Recognition System Using Machine Learning Algorithms. Tikrit Journal of Pure Science [Internet]. 2023 Feb 20;28(1):82–8. Available from: https://www.tjpsj.org/index.php/tjps/article/view/1269

19. Widodo AO, Setiawan B, Indraswari R. Machine Learning-Based Intrusion Detection on Multi-Class Imbalanced Dataset Using SMOTE. Procedia Computer Science [Internet]. 2024;234:578–83. Available from: https://linkinghub.elsevier.com/retrieve/pii/S1877050924004009

20. Reddy Sankepally S, Kosaraju N, Mallikharjuna Rao K. Data Imputation Techniques: An Empirical Study using Chronic Kidney Disease and Life Expectancy Datasets. In: 2022 International Conference on Innovative Trends in Information Technology (ICITIIT) [Internet]. IEEE; 2022. p. 1–7. Available from: https://ieeexplore.ieee.org/document/9744211/

21. Paullada A, Raji ID, Bender EM, Denton E, Hanna A. Data and its (dis)contents: A survey of dataset development and use in machine learning research. Patterns [Internet]. 2021 Nov;2(11):100336. Available from: https://linkinghub.elsevier.com/retrieve/pii/S2666389921001847

22. Madrakhimov S, Makharov K, Lolaev M. Data preprocessing on input. In 2021. p. 030003. Available from: https://pubs.aip.org/aip/acp/article/1028483

23. Pradipta GA, Wardoyo R, Musdholifah A, Sanjaya INH, Ismail M. SMOTE for Handling Imbalanced Data Problem : A Review. In: 2021 Sixth International Conference on Informatics and Computing (ICIC) [Internet]. IEEE; 2021. p. 1–8. Available from: https://ieeexplore.ieee.org/document/9632912/

24. Schonlau M, Zou RY. The random forest algorithm for statistical learning. The Stata Journal: Promoting communications on statistics and Stata [Internet]. 2020 Mar 24;20(1):3–29. Available from: https://journals.sagepub.com/doi/10.1177/1536867X20909688

25. Zhu T. Analysis on the Applicability of the Random Forest. Journal of Physics: Conference Series [Internet]. 2020 Aug 1;1607(1):012123. Available from: https://iopscience.iop.org/article/10.1088/1742-6596/1607/1/012123

26. Zhang H, Jiang L, Yu L. Class-specific attribute value weighting for Naive Bayes. Information Sciences [Internet]. 2020 Jan;508:260–74. Available from: https://linkinghub.elsevier.com/retrieve/pii/S0020025519308217

27. Pajila PJB, Sheena BG, Gayathri A, Aswini J, Nalini M, R SS. A Comprehensive Survey on Naive Bayes Algorithm: Advantages, Limitations and Applications. In: 2023 4th International Conference on Smart Electronics and Communication (ICOSEC) [Internet]. IEEE; 2023. p. 1228–34. Available from: https://ieeexplore.ieee.org/document/10276274/

28. Saadatfar H, Khosravi S, Joloudari JH, Mosavi A, Shamshirband S. A New K-Nearest Neighbors Classifier for Big Data Based on Efficient Data Pruning. Mathematics [Internet]. 2020 Feb 20;8(2):286. Available from: https://www.mdpi.com/2227-7390/8/2/286

29. Salim AP, Laksitowening KA, Asror I. Time Series Prediction on College Graduation Using KNN Algorithm. In: 2020 8th International Conference on Information and Communication Technology (ICoICT) [Internet]. IEEE; 2020. p. 1–4. Available from: https://ieeexplore.ieee.org/document/9166238/

30. L. Von Werra, L. Tunstall, A. Thakur, A. S. Luccioni, T. Thrush, A. Piktus, F. Marty, N. Rajani, V. Mustar, H. Ngo, O. Sanseviero, M. Šaško, A. Villanova, Q. Lhoest, J. Chaumond, M. Mitchell, A. M. Rush, T. L. Von Werra, L. Tunstall, A. Thakur ASL. Processing: System Demonstrations. Proceedings of the The 2022 Conference on Empirical Methods in Natural Language. 2022;128–136.

31. Nahm FS. Receiver operating characteristic curve: overview and practical use for clinicians. Korean Journal of Anesthesiology [Internet]. 2022 Feb 1;75(1):25–36. Available from: http://ekja.org/journal/view.php?doi=10.4097/kja.21209

32. Namdar K, Haider MA, Khalvati F. A Modified AUC for Training Convolutional Neural Networks: Taking Confidence Into Account. Frontiers in Artificial Intelligence [Internet]. 2021 Nov 30;4. Available from: https://www.frontiersin.org/articles/10.3389/frai.2021.582928/full

33. J. Li, L. Wang, X. Zhang, L. Liu, J. Li, M. F. Chan, J. Sui and RY. Biology*Physics. International Journal of Radiation Oncology. 2019;105:893.

Downloads

Published

2025-01-04

Issue

Section

ICHBS Proceedings

How to Cite

Machine Learning for Gestational Diabetes Mellitus: Comparing Random Forest, K-Nearest Neighbor and Naive Bayes for Effective Risk Classification. (2025). Proceeding ICHBS, 1(1), 280-295. https://ichbs.uhb.ac.id/index.php/proceeding/article/view/52