NoPhish: Efficient Chrome Extension for Phishing Detection Using Machine Learning Techniques

Read original: arXiv:2409.10547 - Published 9/18/2024 by Leand Thaqi, Arbnor Halili, Kamer Vishi, Blerim Rexha

NoPhish: Efficient Chrome Extension for Phishing Detection Using Machine Learning Techniques

Overview

Presents an efficient Chrome extension for detecting phishing websites using machine learning techniques.
Aims to provide a lightweight and accurate solution to protect users from phishing attacks.
Leverages a combination of URL-based features and machine learning models to classify websites as legitimate or phishing.

Plain English Explanation

The paper describes the development of a Chrome extension that can detect phishing websites. Phishing is a type of online scam where attackers try to trick users into revealing sensitive information, such as login credentials or financial information, by disguising a malicious website as a legitimate one.

The researchers created a tool that analyzes the URL of a website and uses machine learning models to classify it as either a legitimate or a phishing site. This allows the extension to quickly and accurately identify potential threats, helping to protect users from falling victim to these kinds of attacks.

The key idea is to extract various features from the URL, such as the domain name, the presence of suspicious characters, and the website's age. These features are then fed into machine learning algorithms, which have been trained on a large dataset of known phishing and legitimate websites. The extension can then use these models to make a prediction about whether a website is likely to be a phishing attempt.

By providing this functionality as a lightweight Chrome extension, the researchers aim to make it easy for users to benefit from this phishing detection capability without needing to install a separate, standalone application.

Technical Explanation

The paper first reviews related work in the area of phishing detection, highlighting the need for efficient, client-side solutions that can be easily deployed.

The proposed system extracts a set of URL-based features that are indicative of phishing attempts, such as the domain name, the presence of IP addresses or suspicious characters, and the website's age. These features are then used to train several machine learning models, including XGBoost and logistic regression.

The researchers evaluate the performance of their approach on a large dataset of phishing and legitimate websites. Their results show that the system can achieve high accuracy in detecting phishing sites, while also being efficient and lightweight enough to run as a Chrome extension.

Critical Analysis

The paper provides a thorough evaluation of the proposed system, including comparisons to other state-of-the-art approaches. However, the authors acknowledge that their method relies solely on URL-based features and may not be able to detect more sophisticated phishing attacks that use advanced techniques to disguise the true nature of a website.

Additionally, the authors note that their dataset may not be fully representative of the constantly evolving landscape of phishing attacks. Ongoing maintenance and updates to the machine learning models would be necessary to ensure the extension remains effective over time.

Finally, while the Chrome extension format makes the solution accessible to users, it is limited to the Chrome browser. Expanding the approach to other popular web browsers could further increase the reach and impact of the tool.

Conclusion

The presented Chrome extension offers a promising approach to the problem of phishing detection, leveraging machine learning techniques to provide an efficient and accurate solution. By focusing on URL-based features, the researchers have developed a lightweight system that can be easily deployed to help protect users from these types of online scams.

While the method has some limitations, the core ideas and insights from this work could be further developed and refined to create even more robust and comprehensive phishing detection systems in the future.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

New!NoPhish: Efficient Chrome Extension for Phishing Detection Using Machine Learning Techniques

Leand Thaqi, Arbnor Halili, Kamer Vishi, Blerim Rexha

The growth of digitalization services via web browsers has simplified our daily routine of doing business. But at the same time, it has made the web browser very attractive for several cyber-attacks. Web phishing is a well-known cyberattack that is used by attackers camouflaging as trustworthy web servers to obtain sensitive user information such as credit card numbers, bank information, personal ID, social security number, and username and passwords. In recent years many techniques have been developed to identify the authentic web pages that users visit and warn them when the webpage is phishing. In this paper, we have developed an extension for Chrome the most favorite web browser, that will serve as a middleware between the user and phishing websites. The Chrome extension named NoPhish shall identify a phishing webpage based on several Machine Learning techniques. We have used the training dataset from PhishTank and extracted the 22 most popular features as rated by the Alexa database. The training algorithms used are Random Forest, Support Vector Machine, and k-Nearest Neighbor. The performance results show that Random Forest delivers the best precision.

9/18/2024

Utilizing Large Language Models to Optimize the Detection and Explainability of Phishing Websites

Sayak Saha Roy, Shirin Nilizadeh

In this paper, we introduce PhishLang, an open-source, lightweight language model specifically designed for phishing website detection through contextual analysis of the website. Unlike traditional heuristic or machine learning models that rely on static features and struggle to adapt to new threats, and deep learning models that are computationally intensive, our model leverages MobileBERT, a fast and memory-efficient variant of the BERT architecture, to learn granular features characteristic of phishing attacks. PhishLang operates with minimal data preprocessing and offers performance comparable to leading deep learning anti-phishing tools, while being significantly faster and less resource-intensive. Over a 3.5-month testing period, PhishLang successfully identified 25,796 phishing URLs, many of which were undetected by popular antiphishing blocklists, thus demonstrating its potential to enhance current detection measures. Capitalizing on PhishLang's resource efficiency, we release the first open-source fully client-side Chromium browser extension that provides inference locally without requiring to consult an online blocklist and can be run on low-end systems with no impact on inference times. Our implementation not only outperforms prevalent (server-side) phishing tools, but is significantly more effective than the limited commercial client-side measures available. Furthermore, we study how PhishLang can be integrated with GPT-3.5 Turbo to create explainable blocklisting -- which, upon detection of a website, provides users with detailed contextual information about the features that led to a website being marked as phishing.

9/11/2024

🔎

PhishNet: A Phishing Website Detection Tool using XGBoost

Prashant Kumar, Kevin Antony, Deepakmoney Banga, Arshpreet Sohal

PhisNet is a cutting-edge web application designed to detect phishing websites using advanced machine learning. It aims to help individuals and organizations identify and prevent phishing attacks through a robust AI framework. PhisNet utilizes Python to apply various machine learning algorithms and feature extraction techniques for high accuracy and efficiency. The project starts by collecting and preprocessing a comprehensive dataset of URLs, comprising both phishing and legitimate sites. Key features such as URL length, special characters, and domain age are extracted to effectively train the model. Multiple machine learning algorithms, including logistic regression, decision trees, and neural networks, are evaluated to determine the best performance in phishing detection. The model is finely tuned to optimize metrics like accuracy, precision, recall, and the F1 score, ensuring reliable detection of both common and sophisticated phishing tactics. PhisNet's web application is developed using React.js, which allows for client-side rendering and smooth integration with backend services, creating a responsive and user-friendly interface. Users can input URLs and receive immediate predictions with confidence scores, thanks to a robust backend infrastructure that processes data and provides real-time results. The model is deployed using Google Colab and AWS EC2 for their computational power and scalability, ensuring the application remains accessible and functional under varying loads. In summary, PhisNet represents a significant advancement in cybersecurity, showcasing the effective use of machine learning and web development technologies to enhance user security. It empowers users to prevent phishing attacks and highlights AI's potential in transforming cybersecurity.

7/9/2024

🧠

PhishGuard: A Convolutional Neural Network Based Model for Detecting Phishing URLs with Explainability Analysis

Md Robiul Islam, Md Mahamodul Islam, Mst. Suraiya Afrin, Anika Antara, Nujhat Tabassum, Al Amin

Cybersecurity is one of the global issues because of the extensive dependence on cyber systems of individuals, industries, and organizations. Among the cyber attacks, phishing is increasing tremendously and affecting the global economy. Therefore, this phenomenon highlights the vital need for enhancing user awareness and robust support at both individual and organizational levels. Phishing URL identification is the best way to address the problem. Various machine learning and deep learning methods have been proposed to automate the detection of phishing URLs. However, these approaches often need more convincing accuracy and rely on datasets consisting of limited samples. Furthermore, these black box intelligent models decision to detect suspicious URLs needs proper explanation to understand the features affecting the output. To address the issues, we propose a 1D Convolutional Neural Network (CNN) and trained the model with extensive features and a substantial amount of data. The proposed model outperforms existing works by attaining an accuracy of 99.85%. Additionally, our explainability analysis highlights certain features that significantly contribute to identifying the phishing URL.

4/30/2024