Detection of Phishing URLs Using Machine Learning and Deep Learning Models Implementing a URL Feature Extractor

Abishek Mahesh,

Abishek Mahesh

School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Chennai, Tamil Nadu, India

Search for more papers by this author

Prithvi Seshadri,

Prithvi Seshadri

School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Chennai, Tamil Nadu, India

Search for more papers by this author

Shruti Mishra,

Shruti Mishra

School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Chennai, Tamil Nadu, India

Search for more papers by this author

Sandeep Kumar Satapathy,

Sandeep Kumar Satapathy

School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Chennai, Tamil Nadu, India

Search for more papers by this author

Abishek Mahesh,

Abishek Mahesh

School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Chennai, Tamil Nadu, India

Search for more papers by this author

Prithvi Seshadri,

Prithvi Seshadri

School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Chennai, Tamil Nadu, India

Search for more papers by this author

Shruti Mishra,

Shruti Mishra

School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Chennai, Tamil Nadu, India

Search for more papers by this author

Sandeep Kumar Satapathy,

Sandeep Kumar Satapathy

School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Chennai, Tamil Nadu, India

Search for more papers by this author

Book Editor(s):Sachi Nandan Mohanty,

Sachi Nandan Mohanty

School of Computer Science & Engineering, VIT AP University, Amaravati, Andhra Pradesh, India

Search for more papers by this author

Rajanikanth Aluvalu,

Rajanikanth Aluvalu

Department of IT, Chaitanya Bharathi Institute of Technology, Hyderabad, India

Search for more papers by this author

Sarita Mohanty,

Sarita Mohanty

Department of Computer Science, Odisha University of Agriculture & Technology, Bhubaneswar, India

Search for more papers by this author

First published: 29 May 2023

https://doi.org/10.1002/9781119905172.ch6

Summary

Phishing is a deceitful process by which an attacker tries to steal sensitive information from a naïve user. These types of attacks are generally carried out through emails, text messages, etc. Phishing URLs are a significant threat to cybersecu-rity professionals and practitioners. A lot of research has been done to tackle the problem of Phishing. Several Machine Learning practitioners have developed ML models which can detect Phishing URLs. However, using Machine Learning and Deep Learning also has its challenges and obstacles. The proposed approach detects Phishing URLs by analyzing URL properties, URL metrics, and other certain URL external services. URL Feature Extractor was created in python to extract features from any URL. A dataset of 88,647 phishing and legitimate URLs is used in this study. Several Machine Learning algorithms such as Support Vector Machines (SVM), Logistic Regression (LR), K-Nearest Neighbors (KNN), Naïve-Bayes, Random Forests (RF), Ada-Boost, Gradient- Boosting and Artificial Neural Networks were used to predict Phishing URLs. The results obtained indicate a reasonable accuracy rate. The Gradient Boosting model produced the best Accuracy, Precision, Recall, F1-SCORE of 97% each compared to the other models.

References

Hahn , H. and Stout , R. , The Internet Complete Reference , McGraw-Hill, Inc. , USA , 1994 .
Google Scholar
Roddas , B. , 50,000–plus fake login pages spoofing over 200 brands worldwide , August 24, 2020 . IronScales(blog)/fake-login-pages-spoof-prominent-brands-phishing-attacks/ .
Google Scholar
Alqurashi , R.K. , AlZain , M.A. , Soh , B. , Masud , M. , Al-Amri , J. , Cyber attacks and impacts: A case study in Saudi Arabia . IJATCSE , 9 , 1 , 217 – 224 , 2020 .
10.30534/ijatcse/2020/33912020
Google Scholar
Source Wikipedia , Google Services: Google Chrome, Youtube, Google Maps, Gmail, Google Books, Google Street View, List of Google Products, Orkut, Chromium, Gmail Interface , General Books , Canada , 2013 .
Google Scholar
Moghimi , M. and Varjani , A.Y. , New rule-based phishing detection method . Expert Syst. Appl. , 53 , 231 – 242 , 2016 . https://doi.org/ 10.1016/j.eswa.2016.01.028 .
10.1016/j.eswa.2016.01.028
Web of Science® Google Scholar
Vrbančič , G. , Fister , I. , Podgorelec , V. , Datasets for phishing websites detection . Data Brief , 33 , 106438 , 2020 . https://doi.org/ 10.1016/j.dib.2020.106438 .
10.1016/j.dib.2020.106438
PubMed Web of Science® Google Scholar
Deshpande , A. , Pedamkar , O. , Chaudhary , N. , Borde , S. , Detection of phishing websites using machine learning . Int. J. Eng. Res. Technol. (IJERT) , 10 , 05, May 2021 .
Google Scholar
Saxe , J. and Berlin , K. , eXpose: A character-level convolutional neural network with embeddings for detecting malicious URLs, file paths and registry keys , 2017 , arXiv . https://doi.org/10.48550/arXiv.1702.08568 .
10.48550/arXiv.1702.08568
Google Scholar
Peter , S. et al ., Cost efficient gradient boosting , in: Advances in Neural Information Processing Systems , vol. 30 , 2017 .
Google Scholar
PhishTank.org , Join the fight against phishing . http://www.phishtank.org/ .
Google Scholar
Rojas , R. , AdaBoost and the Super Bowl of Classifiers A Tutorial Introduction to Adaptive Boosting , Technical Report, Freie University , Berlin , 2009 .
Google Scholar
Mishra , S. , Mishra , D. , Satapathy , S.K. , Fuzzy frequent pattern mining from gene expression data using dynamic multi-swarm particle swarm optimization. 2nd International Conference on Computer, Communication, Control and Information Technology (C3IT 2012) , Proc. Technol. , 4 , 797 – 801 , 2012 .
10.1016/j.protcy.2012.05.130
Google Scholar
Satapathy , S.K. , Dehuri , S. , Jagadev , A.K. , Mishra , S. , EEG Brain Signal Classification for Epileptic Seizure Disorder Detection , 1st Ed. , Elsevier , USA , Feb. 2019 .
10.1016/B978-0-12-817426-5.00001-6
Google Scholar
Alshira'H , M. and Al-Fawa'reh , M. , Detecting phishing URLs using machine learning & lexical feature-based analysis . IJATCSE , 9 , 5828 – 5837 , 2020 .
10.30534/ijatcse/2020/242942020
Google Scholar
Abraham , A. , Artificial neural networks , in: Handbook of Measuring System Design , vol. 2 , 2005 .
10.1002/0471497398.mm421
Google Scholar
Chang , C.C. and Lin , C.J. , LIBSVM: A library for support vector machines . ACM Trans. Intell. Syst. Technol. , 2 , 3 , 1 – 27 , 2011 .
10.1145/1961189.1961199
Web of Science® Google Scholar
Rish , I. , An empirical study of the naive Bayes classifier . IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence , vol. 3 , 2001 .
Google Scholar
Satapathy , S.K. , Dehuri , S. , Jagadev , A.K. , An empirical analysis of different machine learning techniques for classification of EEG signal to detect epileptic seizure . Int. J. Appl. Eng. Res. , 11 , 1 , 120 – 129 , 2016 .
Google Scholar
Satapathy , S.K. , Jagadev , A.K. , Dehuri , S. , An empirical analysis of training algorithms of neural networks: A case study of EEG signal classification using java framework , in: Advances in Intelligent Systems and Computing , vol. 309 , L.C. Jain (Eds.), pp. 151 – 160 , Springer , USA , 2015 .
Google Scholar

Evolution and Applications of Quantum Computing