Chapter 6

Detection of Phishing URLs Using Machine Learning and Deep Learning Models Implementing a URL Feature Extractor

Abishek Mahesh

Abishek Mahesh

School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Chennai, Tamil Nadu, India

Search for more papers by this author
Prithvi Seshadri

Prithvi Seshadri

School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Chennai, Tamil Nadu, India

Search for more papers by this author
Shruti Mishra

Shruti Mishra

School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Chennai, Tamil Nadu, India

Search for more papers by this author
Sandeep Kumar Satapathy

Sandeep Kumar Satapathy

School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Chennai, Tamil Nadu, India

Search for more papers by this author
First published: 29 May 2023

Summary

Phishing is a deceitful process by which an attacker tries to steal sensitive information from a naïve user. These types of attacks are generally carried out through emails, text messages, etc. Phishing URLs are a significant threat to cybersecu-rity professionals and practitioners. A lot of research has been done to tackle the problem of Phishing. Several Machine Learning practitioners have developed ML models which can detect Phishing URLs. However, using Machine Learning and Deep Learning also has its challenges and obstacles. The proposed approach detects Phishing URLs by analyzing URL properties, URL metrics, and other certain URL external services. URL Feature Extractor was created in python to extract features from any URL. A dataset of 88,647 phishing and legitimate URLs is used in this study. Several Machine Learning algorithms such as Support Vector Machines (SVM), Logistic Regression (LR), K-Nearest Neighbors (KNN), Naïve-Bayes, Random Forests (RF), Ada-Boost, Gradient- Boosting and Artificial Neural Networks were used to predict Phishing URLs. The results obtained indicate a reasonable accuracy rate. The Gradient Boosting model produced the best Accuracy, Precision, Recall, F1-SCORE of 97% each compared to the other models.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.