Early stage prediction of COVID-19 Using machine learning model
DOI:
https://doi.org/10.31185/wjcm.107Keywords:
COVID-19 SMOTE, tuning hyperparameters, Filter-based feature selection, Random Forest, Logistic Regression, Decision treeAbstract
The healthcare sector has traditionally been an early use of technological progress and has achieved significant advantages, especially in the field of machine learning like the prediction of diseases. The COVID-19 epidemic is still having an impact on every facet of life and necessitates a fast and accurate diagnosis. Early detection of COVID-19 is exceptionally critical to saving the lives of human beings. The need for an effective, rapid, and precise way to reduce consultants' workload in diagnosing suspected cases has emerged. This paper presents a proposed model that aims to design and implement an automated model to predict COVID-19 with high accuracy in the early stages. The dataset used in this study considers an imbalanced dataset and converted to a balanced one using Synthetic Minority Over Sampling Technique (SMOTE). Filter-based feature selection method and many machine learning algorithms such as K-Nearest Neighbor, Support Vector Machine, Decision Tree, Logistic Regression, and Random Forest (RF) is used in this model. Since the best classification result was achieved by using the RF algorithm, and this algorithm was optimized by tuning the hyperparameters. The optimized RF enhanced the accuracy from 98.0 to 99.5.
References
E. Gambhir, R. Jain, A. Gupta, and U. Tomer, “Regression analysis of COVID-19 using machine learning algorithms,” 2020 International conference on smart electronics and communication (ICOSEC), pp. 65–71, 2020.
“World Health Organization . Coronavirus 2021,” 2022. https://www.who.int/health-topics/coronavirus(accessed2022.
L. Wynants, “Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal,” BMJ, vol. 369, pp. 1328–1328, 2020.
A. K. Dwivedi, “Performance evaluation of different machine learning techniques for prediction of heart disease,” Neural Computing and Applications, vol. 29, no. 10, pp. 685–693, 2016.
S. S. P. Shimpi, M. Shroff, and A. Godbole, “A Machine Learning Approach for the lassification of Cardiac Arrhythmia,” 2017 International Conference on Computing Methodologies and Communication (ICCMC), 2017.
A. Arista, “Comparison Decision Tree and Logistic Regression Machine Learning Classification Algorithms to determine Covid-19,” Sinkron, vol. 7, no. 1, pp. 59–65, 2022.
C. N. Villavicencio, J. J. Macrohon, X. A. Inbaraj, J. H. Jeng, and J. G. Hsieh, “Development of a Machine Learning Based Web Application for Early Diagnosis of COVID-19 Based on Symptoms,” Diagnostics (Basel), vol. 12, no. 4, 2022.
S. G. A. B. Majumder and D. Singh, “An Intelligent System for Prediction of COVID-19 Case using Machine Learning Framework-Logistic Regression,” Journal of Physics, pp. 2021–2021.
K. B. Prakash, “Analysis, Prediction and Evaluation of COVID-19 Datasets using Machine Learning Algorithms,” International Journal of Emerging Trends in Engineering Research, vol. 8, no. 5, pp. 2199–2204, 2020.
L. J. Muhammad, E. A. Algehyne, S. S. Usman, A. Ahmad, C. Chakraborty, and I. A. Mohammed, “Supervised Machine Learning Models for Prediction of COVID-19 Infection using Epidemiology Dataset,” SN Comput Sci, vol. 2, no. 1, pp. 2021–2021.
P. Wu, “An Effective Machine Learning Approach for Identifying Non-Severe and Severe Coronavirus Disease 2019 Patients in a Rural Chinese Population: The Wenzhou Retrospective Study,” IEEE Access, vol. 9, pp. 45486–45503, 2021.
J. Cao, Z. Zhang, J. Du, L. Zhang, Y. Song, and G. Sun, “Multi-geohazards susceptibility mapping based on machine learning-A case study in Jiuzhaigou, China,” Natural Hazards, vol. 102, no. 3, pp. 851–871, 2020.
J. Wu, “Rapid and accurate identification of COVID-19 infection through machine learning based on clinical available blood test results,” 2020.
C. Symptoms and Presence. https://www.kaggle.com/datasets/hemanthhari/symptoms-and-covid-presence.
D. Dablain, B. Krawczyk, and N. V. Chawla, “DeepSMOTE: Fusing Deep Learning and SMOTE for Imbalanced Data,” IEEE Trans Neural Netw Learn Syst, 2022.
N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Journal of artificial intelligence research, vol. 16, pp. 321–357, 2002.
K. Anitha, “Rough neural network,” Asian Journal of Research in Social Sciences and Humanities, vol. 6, no. cs1, pp. 413–421, 2016.
G. Chandrashekar and F. Sahin, “A survey on feature selection methods,” Computers & Electrical Engineering, vol. 40, no. 1, pp. 16 28, 2014.
U. M. Khaire and R. Dhanalakshmi, “Stability of feature selection algorithm: A review,” Journal of King Saud University-Computer and Information Sciences, 2019.
K. Jha and S. Saha, “Incorporation of multimodal multiobjective optimization in designing a filter based feature selection technique,” Applied Soft Computing, vol. 98, pp. 106823–106823, 2021.
C. Rao and V. N. Gudivada, “Computational analysis and understanding of natural languages: principles, methods and applications,” 2018. Elsevier.
N. Abuja, “Prediction Of Heart Disease Using Bayesian Network Model,” 2019.
D. Namly, K. Bouzoubaa, A. E. Jihad, and S. L. Aouragh, “Improving Arabic lemmatization through a lemmas database and a machine-learning technique,” Recent Advances in NLP: The Case of Arabic Language, pp. 81–100, 2020.
V. Sharma, S. Yadav, and M. Gupta, “Heart Disease Prediction using Machine Learning Techniques,” 2020 2nd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN), 2020.
P. A. T. Azhar and M, “Comparative Review of Feature Selection and Classification modeling,” presented at the 2019 International Conference on Advances in Computing, Communication and Control (ICAC3), vol. 1, 2019.
J. Yu, S. Greco, P. Lingras, G. Wang, and A. Skowron, “Rough set and knowledge technology: 5th international conference, rskt 2010,” Springer, 2010.
K. Vembandasamy, R. Sasipriya, and E. Deepa, “Heart diseases detection using Naive Bayes algorithm,” International Journal of Innovative Science, Engineering & Technology, vol. 2, no. 9, pp. 441–444, 2015.
S. Vijiyarani and S. Sudha, “Disease prediction in data mining technique-a survey,” International Journal of Computer Applications & Information Technology, vol. 2, no. 1, pp. 17–21, 2013.
J. Soni, U. Ansari, D. Sharma, and S. Soni, “Predictive data mining for medical diagnosis: An overview of heart disease prediction,” International Journal of Computer Applications, vol. 17, no. 8, pp. 43–48, 2011.
G. Biau and E. Scornet, “A random forest guided tour,” test, vol. 25, no. 2, pp. 197–227, 2016.
L. Yang and A. Shami, “On hyperparameter optimization of machine learning algorithms: Theory and practice,” Neurocomputing, vol. 415, pp. 295–316, 2020.
M. Feurer and F. Hutter, “Hyperparameter optimization,” in Automated machine learning, pp. 3–33, Springer, 2019.
Downloads
Published
Issue
Section
License
Copyright (c) 2023 Mohammad Abood Kadhim, Abdulkareem Merhej Radhi
This work is licensed under a Creative Commons Attribution 4.0 International License.