New Heuristics Method for Malicious URLs Detection Using Machine Learning
DOI:
https://doi.org/10.31185/wjcms.267Keywords:
Malicious URLs, Machine Learning, Cybersecurity, Logistic Regression, Random Forest, Support Vector MachinesAbstract
Malicious URLs are a very prominent, dangerous form of cyber threats in view of the fact that they can enable many evils like phishing attacks, malware distribution, and several other kinds of cyber fraud. The techniques of detection conventionally applied are based on blacklisting and heuristic analyses, which are gradually becoming inefficient against sophisticated, rapidly evolving threats. In this paper, the authors present various machine learning techniques applied in malicious URL detection. In the present paper, we will look at three machine learning models: Logistic Regression, Random Forest, and Support Vector Machines. We used a methodology that involved collecting data and feature extraction, training a model, then evaluating its performance with different metrics such as accuracy, precision, recall, and F1-score. We implemented and optimized three models—Logistic Regression, Random Forest, and Support Vector Machines (SVM)—based on the literature available that indicates the effectiveness of these models. Logistic Regression shows promising results to detect the malicious URLs, according to Vanitha and Vinodhini. Random Forest models are found to be very robust and accurate according to Cui et al. and Vanhoenshoven et al., SVM models are evidenced to have very high accuracy according to Manjeri et al., Further works on deep learning models emphasized their potentials. In our study, the optimized Random Forest model in our case showed the best performance, and its training accuracy was 99%, while validation accuracy was 90.5%, also logistic Regression and SVM achieved training accuracy was 89.31%, while validation accuracy was 90.5%. All the optimization processes, model performances, and integration into the real-time cybersecurity infrastructures, along with the strengths and limitations, are discussed in this paper. The paper will, therefore, discuss the benefits and challenges for each model in this aspect—emphasizing continuous updating of the models and integrating them into real-time cybersecurity infrastructures.
References
D. Sahoo, C. Liu, S.C. Hoi, "Malicious URL Detection Using Machine Learning: A Survey," arXiv preprint, 2017.
A.S. Manjeri, R. Kaushik, M. Ajay, P.C. Nair, "A machine learning approach for detecting malicious websites using URL features," 3rd International Conference on Electronics, Communication and Aerospace Technology (ICECA), IEEE, 2019, pp. 555-561.
B. Cui, S. He, X. Yao, P. Shi, "Malicious URL detection with feature extraction based on machine learning," Int. J. High Perform. Comput. Netw., vol. 12, no. 2, pp. 166-178, 2018.
V.M. Patro, M.R. Patra, "Augmenting weighted average with confusion matrix to enhance classification accuracy," Trans. Mach. Learn. Artif. Intell., vol. 2, no. 4, pp. 77-91, 2014.
N. Vanitha, V. Vinodhini, "Malicious-URL detection using logistic regression technique," Int. J. Eng. Manage. Res. (IJEMR), pp. 108-113, 2019.
R. Kumar, et al., "Malicious URL detection using multi-layer filtering model," 14th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), IEEE, 2017.
Y.-C. Chen, Y.-W. Ma, J.-L. Chen, "Intelligent malicious URL detection with feature analysis," IEEE Symposium on Computers and Communications (ISCC), IEEE, 2020.
R. Patgiri, R., et al., "Empirical study on malicious URL detection using machine learning," International Conference on Distributed Computing and Internet Technology, Springer, Cham, 2019.
J. Kumar, et al., "Phishing website classification and detection using machine learning," 2020 International Conference on Computer Communication and Informatics (ICCCI), IEEE, 2020.
F. Vanhoenshoven, et al., "Detecting malicious URLs using machine learning techniques," 2016 IEEE Symposium Series on Computational Intelligence (SSCI), IEEE, 2016.
H. Kumar, P. Gupta, R.P. Mahapatra, "Protocol based ensemble classifier for malicious URL detection," 3rd International Conference on Contemporary Computing and Informatics (IC3I), IEEE, 2018.
M.S.I. Mamun, et al., "Detecting malicious urls using lexical analysis," International Conference on Network and System Security, Springer, Cham, 2016.
S. Jino, S.V. Niranjan, R. Madhan Kumar, A. Harinisree, "Machine learning based malicious website detection," J. Comput. Theor. Nanosci., vol. 17, no. 8, pp. 3468-3472, 2020.
D. Kapil, A. Bansal, N.M.A.J. Anupriya, "Machine learning based malicious URL detection," International Journal of Engineering and Management Research, 2019.
C. Do Xuan, H.D. Nguyen, V.N. Tisenko, "Malicious URL detection based on machine learning," Int. J. Adv. Comput. Sci. Appl., vol. 11, no. 1, 2020.
Q.T. Hai, S.O. Hwang, "Detection of malicious URLs based on word vector representation and N-gram," J. Intell. Fuzzy Syst., vol. 35, pp. 5889-5900, 2018.
T. Tiefeng, M. Wang, Y. Xi, Z. Zhao, "Malicious URL detection model based on bidirectional gated recurrent unit and attention mechanism," Appl. Sci., vol. 12, no. 23, p. 12367, 2022.
K. Haynes, H. Shirazi, I. Ray, "Lightweight URL-based phishing detection using natural language processing transformers for mobile devices," Procedia Comput. Sci., vol. 127, pp. 127-134, 2021.
T. Lin, Y. Wang, X. Liu, X. Qiu, "A survey of transformers," AI Open, vol. 3, pp. 111-132, 2022.
J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, "BERT: pre-training of deep bidirectional transformers for language understanding," Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 4171-4186, 2019.
Q. Li, Q. Chen, S. Qi, J. Hu, "Malicious URL detection using ML," Journal of Internet Services and Information Security, vol. 10, no. 1, pp. 60-78, 2020.
D. S. Naik, V. S. Satpute, "A study of machine learning techniques for malicious URL detection," 2020 International Conference on Computer Communication and Informatics (ICCCI), IEEE, 2020.
T. Blanchard, M. S. McKenna, "Combining machine learning and heuristics for URL analysis," Journal of Cybersecurity and Privacy, vol. 2, no. 1, pp. 1-22, 2022.
H. Zheng, X. Zhang, "A comprehensive review on malicious URL detection techniques," Journal of Computer Science and Technology, vol. 35, no. 2, pp. 389-408, 2020.
P. Gupta, A. Kumar, "Malicious URL detection: A survey of ML techniques," International Journal of Information Security and Privacy, vol. 14, no. 1, pp. 52-75, 2020.
B. Zhou, Q. Zhang, "Comparative analysis of ML methods for phishing website detection," Proceedings of the 13th International Conference on Information Security and Cryptology (ISC), 2020.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Maher Kassem Hasan
This work is licensed under a Creative Commons Attribution 4.0 International License.