Detection of Phishing URLs Using Machine Learning and Deep Learning Models Implementing a URL Feature Extractor
Abishek Mahesh
School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Chennai, Tamil Nadu, India
Search for more papers by this authorPrithvi Seshadri
School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Chennai, Tamil Nadu, India
Search for more papers by this authorShruti Mishra
School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Chennai, Tamil Nadu, India
Search for more papers by this authorSandeep Kumar Satapathy
School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Chennai, Tamil Nadu, India
Search for more papers by this authorAbishek Mahesh
School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Chennai, Tamil Nadu, India
Search for more papers by this authorPrithvi Seshadri
School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Chennai, Tamil Nadu, India
Search for more papers by this authorShruti Mishra
School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Chennai, Tamil Nadu, India
Search for more papers by this authorSandeep Kumar Satapathy
School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Chennai, Tamil Nadu, India
Search for more papers by this authorSachi Nandan Mohanty
School of Computer Science & Engineering, VIT AP University, Amaravati, Andhra Pradesh, India
Search for more papers by this authorRajanikanth Aluvalu
Department of IT, Chaitanya Bharathi Institute of Technology, Hyderabad, India
Search for more papers by this authorSarita Mohanty
Department of Computer Science, Odisha University of Agriculture & Technology, Bhubaneswar, India
Search for more papers by this authorSummary
Phishing is a deceitful process by which an attacker tries to steal sensitive information from a naïve user. These types of attacks are generally carried out through emails, text messages, etc. Phishing URLs are a significant threat to cybersecu-rity professionals and practitioners. A lot of research has been done to tackle the problem of Phishing. Several Machine Learning practitioners have developed ML models which can detect Phishing URLs. However, using Machine Learning and Deep Learning also has its challenges and obstacles. The proposed approach detects Phishing URLs by analyzing URL properties, URL metrics, and other certain URL external services. URL Feature Extractor was created in python to extract features from any URL. A dataset of 88,647 phishing and legitimate URLs is used in this study. Several Machine Learning algorithms such as Support Vector Machines (SVM), Logistic Regression (LR), K-Nearest Neighbors (KNN), Naïve-Bayes, Random Forests (RF), Ada-Boost, Gradient- Boosting and Artificial Neural Networks were used to predict Phishing URLs. The results obtained indicate a reasonable accuracy rate. The Gradient Boosting model produced the best Accuracy, Precision, Recall, F1-SCORE of 97% each compared to the other models.
References
- Hahn , H. and Stout , R. , The Internet Complete Reference , McGraw-Hill, Inc. , USA , 1994 .
- Roddas , B. , 50,000–plus fake login pages spoofing over 200 brands worldwide , August 24, 2020 . IronScales(blog)/fake-login-pages-spoof-prominent-brands-phishing-attacks/ .
-
Alqurashi , R.K.
,
AlZain , M.A.
,
Soh , B.
,
Masud , M.
,
Al-Amri , J.
,
Cyber attacks and impacts: A case study in Saudi Arabia
.
IJATCSE
,
9
,
1
,
217
–
224
,
2020
.
10.30534/ijatcse/2020/33912020 Google Scholar
- Source Wikipedia , Google Services: Google Chrome, Youtube, Google Maps, Gmail, Google Books, Google Street View, List of Google Products, Orkut, Chromium, Gmail Interface , General Books , Canada , 2013 .
- Moghimi , M. and Varjani , A.Y. , New rule-based phishing detection method . Expert Syst. Appl. , 53 , 231 – 242 , 2016 . https://doi.org/ 10.1016/j.eswa.2016.01.028 .
- Vrbančič , G. , Fister , I. , Podgorelec , V. , Datasets for phishing websites detection . Data Brief , 33 , 106438 , 2020 . https://doi.org/ 10.1016/j.dib.2020.106438 .
- Deshpande , A. , Pedamkar , O. , Chaudhary , N. , Borde , S. , Detection of phishing websites using machine learning . Int. J. Eng. Res. Technol. (IJERT) , 10 , 05, May 2021 .
-
Saxe , J.
and
Berlin , K.
,
eXpose: A character-level convolutional neural network with embeddings for detecting malicious URLs, file paths and registry keys
,
2017
,
arXiv
.
https://doi.org/10.48550/arXiv.1702.08568
.
10.48550/arXiv.1702.08568 Google Scholar
- Peter , S. et al ., Cost efficient gradient boosting , in: Advances in Neural Information Processing Systems , vol. 30 , 2017 .
- PhishTank.org , Join the fight against phishing . http://www.phishtank.org/ .
- Rojas , R. , AdaBoost and the Super Bowl of Classifiers A Tutorial Introduction to Adaptive Boosting , Technical Report, Freie University , Berlin , 2009 .
-
Mishra , S.
,
Mishra , D.
,
Satapathy , S.K.
,
Fuzzy frequent pattern mining from gene expression data using dynamic multi-swarm particle swarm optimization. 2nd International Conference on Computer, Communication, Control and Information Technology (C3IT 2012)
,
Proc. Technol.
,
4
,
797
–
801
,
2012
.
10.1016/j.protcy.2012.05.130 Google Scholar
-
Satapathy , S.K.
,
Dehuri , S.
,
Jagadev , A.K.
,
Mishra , S.
,
EEG Brain Signal Classification for Epileptic Seizure Disorder Detection
,
1st Ed.
,
Elsevier
,
USA
, Feb.
2019
.
10.1016/B978-0-12-817426-5.00001-6 Google Scholar
-
Alshira'H , M.
and
Al-Fawa'reh , M.
,
Detecting phishing URLs using machine learning & lexical feature-based analysis
.
IJATCSE
,
9
,
5828
–
5837
,
2020
.
10.30534/ijatcse/2020/242942020 Google Scholar
-
Abraham , A.
,
Artificial neural networks
, in:
Handbook of Measuring System Design
, vol.
2
,
2005
.
10.1002/0471497398.mm421 Google Scholar
- Chang , C.C. and Lin , C.J. , LIBSVM: A library for support vector machines . ACM Trans. Intell. Syst. Technol. , 2 , 3 , 1 – 27 , 2011 .
- Rish , I. , An empirical study of the naive Bayes classifier . IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence , vol. 3 , 2001 .
- Satapathy , S.K. , Dehuri , S. , Jagadev , A.K. , An empirical analysis of different machine learning techniques for classification of EEG signal to detect epileptic seizure . Int. J. Appl. Eng. Res. , 11 , 1 , 120 – 129 , 2016 .
- Satapathy , S.K. , Jagadev , A.K. , Dehuri , S. , An empirical analysis of training algorithms of neural networks: A case study of EEG signal classification using java framework , in: Advances in Intelligent Systems and Computing , vol. 309 , L.C. Jain (Eds.), pp. 151 – 160 , Springer , USA , 2015 .