Different Victims, Same Layout: Email Visual Similarity Detection for Enhanced Email Protection

Read original: arXiv:2408.16945 - Published 9/5/2024 by Sachin Shukla, Omid Mirzaei

Different Victims, Same Layout: Email Visual Similarity Detection for Enhanced Email Protection

Overview

Proposes a novel approach to detect visually similar emails for enhanced email protection
Focuses on detecting phishing emails with similar layouts or visual designs to known phishing templates
Leverages image embedding techniques and clustering to identify visually similar emails

Plain English Explanation

Phishing emails are a major security threat, as they can trick people into revealing sensitive information or installing malware. This paper introduces a new method to help detect phishing emails by looking at the visual design of the email, rather than just the text content.

The key idea is that many phishing emails use similar visual layouts or templates, even if the specific content changes. By analyzing the visual similarity of emails, the researchers can identify emails that are likely part of the same phishing campaign, even if the text is different.

This is done by converting the email images into numerical "embeddings" that capture the visual features. These embeddings are then grouped using clustering algorithms to find emails with similar visual layouts. This allows the system to quickly identify new phishing emails that match known templates, providing enhanced protection.

Technical Explanation

The proposed approach first extracts images from emails and converts them into visual embeddings using a deep learning model. It then uses DBSCAN clustering to group together emails with similar visual layouts.

The key steps are:

Image Extraction: The system extracts images from the email HTML and converts them to a standardized format.
Visual Embedding: A pre-trained image classification model is used to encode the email images into high-dimensional numerical vectors that capture the visual features.
Clustering: The visual embeddings are clustered using the DBSCAN algorithm, which groups together emails with similar visual layouts without requiring the number of clusters to be specified in advance.
Similarity Detection: When a new email arrives, its visual embedding is compared to the existing clusters. If it is found to be visually similar to a known phishing cluster, it can be flagged as potentially malicious.

The paper demonstrates the effectiveness of this approach on a dataset of real-world phishing and legitimate emails. The visual similarity detection was able to identify phishing emails with high accuracy, outperforming text-based approaches in many cases.

Critical Analysis

The proposed method provides a promising new direction for enhancing email security, but there are a few potential limitations:

The approach relies on the ability to extract high-quality images from emails, which may not always be possible, especially for emails with complex or obfuscated layouts.
The visual embedding model used was pre-trained on general image data, and may not capture all the nuances of email design. A model trained specifically on email visual features could potentially perform better.
The DBSCAN clustering algorithm requires the selection of two hyperparameters, which could impact the quality of the clustering results and require careful tuning.

Overall, the research presents an innovative and effective approach to phishing detection, and the ideas could be further developed and refined to improve email security for a wide range of users.

Conclusion

This paper introduces a novel method for detecting visually similar emails, which can be a powerful tool for enhancing email protection against phishing attacks. By leveraging image embedding and clustering techniques, the system can identify emails that are likely part of the same phishing campaign, even if the textual content is different.

The proposed approach demonstrates promising results and could be further developed to provide more robust and comprehensive email security solutions, helping to protect users from the growing threat of phishing.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Different Victims, Same Layout: Email Visual Similarity Detection for Enhanced Email Protection

Sachin Shukla, Omid Mirzaei

In the pursuit of an effective spam detection system, the focus has often been on identifying known spam patterns either through rule-based detection systems or machine learning (ML) solutions that rely on keywords. However, both systems are susceptible to evasion techniques and zero-day attacks that can be achieved at low cost. Therefore, an email that bypassed the defense system once can do it again in the following days, even though rules are updated or the ML models are retrained. The recurrence of failures to detect emails that exhibit layout similarities to previously undetected spam is concerning for customers and can erode their trust in a company. Our observations show that threat actors reuse email kits extensively and can bypass detection with little effort, for example, by making changes to the content of emails. In this work, we propose an email visual similarity detection approach, named Pisco, to improve the detection capabilities of an email threat defense system. We apply our proof of concept to some real-world samples received from different sources. Our results show that email kits are being reused extensively and visually similar emails are sent to our customers at various time intervals. Therefore, this method could be very helpful in situations where detection engines that rely on textual features and keywords are bypassed, an occurrence our observations show happens frequently.

9/5/2024

Eyes on the Phish(er): Towards Understanding Users' Email Processing Pattern and Mental Models in Phishing Detection

Sijie Zhuo, Robert Biddle, Jared Daniel Recomendable, Giovanni Russello, Danielle Lottridge

Phishing emails typically masquerade themselves as reputable identities to trick people into providing sensitive information and credentials. Despite advancements in cybersecurity, attackers continuously adapt, posing ongoing threats to individuals and organisations. While email users are the last line of defence, they are not always well-prepared to detect phishing emails. This study examines how workload affects susceptibility to phishing, using eye-tracking technology to observe participants' reading patterns and interactions with tailored phishing emails. Incorporating both quantitative and qualitative analysis, we investigate users' attention to two phishing indicators, email sender and hyperlink URLs, and their reasons for assessing the trustworthiness of emails and falling for phishing emails. Our results provide concrete evidence that attention to the email sender can reduce phishing susceptibility. While we found no evidence that attention to the actual URL in the browser influences phishing detection, attention to the text masking links can increase phishing susceptibility. We also highlight how email relevance, familiarity, and visual presentation impact first impressions of email trustworthiness and phishing susceptibility.

9/14/2024

💬

Large Language Models Spot Phishing Emails with Surprising Accuracy: A Comparative Analysis of Performance

Het Patel, Umair Rehman, Farkhund Iqbal

Phishing, a prevalent cybercrime tactic for decades, remains a significant threat in today's digital world. By leveraging clever social engineering elements and modern technology, cybercrime targets many individuals, businesses, and organizations to exploit trust and security. These cyber-attackers are often disguised in many trustworthy forms to appear as legitimate sources. By cleverly using psychological elements like urgency, fear, social proof, and other manipulative strategies, phishers can lure individuals into revealing sensitive and personalized information. Building on this pervasive issue within modern technology, this paper aims to analyze the effectiveness of 15 Large Language Models (LLMs) in detecting phishing attempts, specifically focusing on a randomized set of 419 Scam emails. The objective is to determine which LLMs can accurately detect phishing emails by analyzing a text file containing email metadata based on predefined criteria. The experiment concluded that the following models, ChatGPT 3.5, GPT-3.5-Turbo-Instruct, and ChatGPT, were the most effective in detecting phishing emails.

6/10/2024

📶

Analysis and prevention of AI-based phishing email attacks

Chibuike Samuel Eze, Lior Shamir

Phishing email attacks are among the most common and most harmful cybersecurity attacks. With the emergence of generative AI, phishing attacks can be based on emails generated automatically, making it more difficult to detect them. That is, instead of a single email format sent to a large number of recipients, generative AI can be used to send each potential victim a different email, making it more difficult for cybersecurity systems to identify the scam email before it reaches the recipient. Here we describe a corpus of AI-generated phishing emails. We also use different machine learning tools to test the ability of automatic text analysis to identify AI-generated phishing emails. The results are encouraging, and show that machine learning tools can identify an AI-generated phishing email with high accuracy compared to regular emails or human-generated scam email. By applying descriptive analytic, the specific differences between AI-generated emails and manually crafted scam emails are profiled, and show that AI-generated emails are different in their style from human-generated phishing email scams. Therefore, automatic identification tools can be used as a warning for the user. The paper also describes the corpus of AI-generated phishing emails that is made open to the public, and can be used for consequent studies. While the ability of machine learning to detect AI-generated phishing email is encouraging, AI-generated phishing emails are different from regular phishing emails, and therefore it is important to train machine learning systems also with AI-generated emails in order to repel future phishing attacks that are powered by generative AI.

5/10/2024