Phishing Website Detection Using URL Features
Main Article Content
Abstract
Phishing attacks have become one of the most pervasive cyber-security threats, with attackers using fraudulent websites to deceive users and steal sensitive information. Conventional detection methods often rely on blacklists and content-based analysis, which are computationally expensive and struggle with zero-day attacks. This study proposes a lightweight, real-time phishing detection system that utilizes only URL-based features combined with machine learning classifiers. A dataset of phishing and legitimate URLs was collected and pre-processed. The extracted structural URL features including token length, digit count, symbol frequency, domain age, use of IP address, and lexical properties. Multiple classifiers Random Forest, Support Vector Machine (SVM), Gradient Boosting, and XGBoost were trained and evaluated. The XGBoost model achieved the highest performance with 98.7% accuracy, 98.4% precision, 98.9% recall, and 98.6% F1-score on the test set, outperforming baseline methods. Feature importance analysis revealed that domain length, presence of hyphens, and abnormal token counts strongly correlate with phishing behavior. The proposed model is suitable for integration into browsers and email security systems for early URL filtering.
Article Details
References
S. Kavya and D. Sumathi, “Staying ahead of phishers: A review of recent advances and emerging methodologies in phishing detection,” Artificial Intelligence Review, vol. 58, Dec. 2024.
“Phishing Website Detection Using Deep Learning Models,” 2024.
Q. E. ul Haq, M. H. Faheem, and I. Ahmad, “Detecting phishing URLs based on a deep learning approach to prevent cyber-attacks,” Applied Sciences, vol. 14, no. 22, Nov. 2024.
S. Kavya and D. Sumathi, ibid.
R. Hasan et al., “Lexical and host-based features for phishing detection,” 2019.
F. Ahmed and M. Abulaish, “Ensemble classifiers for phishing detection,” 2017.
R. Dubey et al., “Phishing detection system: An ensemble approach using character-level CNN and feature engineering,” arXiv, 2024.
“Phishing URL Detection Using Comprehensive Feature Extraction and Machine Learning Techniques,” IEEE CS BDC Symposium, 2024.
“Machine Learning and Neural Networks for Phishing Detection: A Systematic Review (2017–2024),” MDPI, 2024.