Learning to Break Deep Perceptual Hashing: The Use Case NeuralHash

Read original: arXiv:2111.06628 - Published 7/17/2024 by Lukas Struppek, Dominik Hintersdorf, Daniel Neider, Kristian Kersting

🤿

Overview

Apple revealed a system called NeuralHash to detect child sexual abuse material (CSAM) on user devices before files are uploaded to iCloud
Public criticism arose about the system's impact on user privacy and reliability
This paper presents a comprehensive analysis showing that current deep perceptual hashing like NeuralHash may not be robust to adversarial attacks

Plain English Explanation

Apple recently announced a new system called NeuralHash that scans user devices for images of child sexual abuse material (CSAM) before those files are uploaded to the company's iCloud storage service. The goal is to detect and remove this abusive content. However, the system has faced significant public backlash over concerns about user privacy and the reliability of the detection method.

This research paper takes a close look at the security and privacy issues with deep perceptual hashing techniques like NeuralHash. The key finding is that these hashing algorithms are vulnerable to adversarial attacks, where small changes to images can manipulate the hash values in ways that either hide abusive content or frame innocent users. Additionally, the hash values themselves can reveal information about the data on a user's device, potentially compromising privacy.

Overall, the paper argues that current deep perceptual hashing is not ready for robust client-side scanning and should not be used due to these significant privacy and security risks. The researchers suggest that more work is needed to develop reliable and private hashing approaches before deploying them for sensitive applications like detecting CSAM.

Technical Explanation

The paper presents a comprehensive empirical analysis of deep perceptual hashing based on Apple's NeuralHash system. The researchers show that current deep perceptual hashing approaches may not be as robust as claimed.

Through various experiments, they demonstrate that an adversary can manipulate the hash values of images by applying small changes, either through gradient-based optimization techniques or standard image transformations. This allows them to force or prevent hash collisions, effectively enabling malicious actors to either hide abusive material or frame innocent users.

The paper also finds that the hash values themselves can leak information about the data stored on a user's device, posing privacy risks even if no actual CSAM is present. This is because the hash function can be used to make inferences about the original image content.

Overall, the researchers conclude that deep perceptual hashing in its current form is generally not suitable for robust client-side scanning applications like detecting CSAM. They suggest that further research and development is needed to address the security and privacy limitations identified in the paper.

Critical Analysis

The paper provides a thorough and well-designed empirical analysis of the security and privacy issues with deep perceptual hashing algorithms like NeuralHash. The researchers have carefully constructed adversarial attacks to demonstrate the vulnerabilities of these hashing techniques, which is a significant contribution to the field.

However, the paper does not delve into potential mitigations or countermeasures that could be employed to address the identified issues. While the researchers acknowledge the need for further research and development, they could have provided more insight into possible solutions or directions for improving the robustness and privacy-preserving properties of deep perceptual hashing.

Additionally, the paper focuses solely on the technical aspects of the hashing algorithms and does not consider the broader societal implications of deploying such systems, such as the potential for abuse, the impact on marginalized communities, or the trade-offs between privacy and public safety. A more holistic discussion of these issues could have provided a richer and more nuanced perspective on the topic.

Nevertheless, the paper's findings are important and timely, given the ongoing debate around Apple's NeuralHash system and the broader implications of client-side scanning technologies. The research highlights the need for caution and careful consideration when implementing sensitive applications that involve the analysis of user data, even if the intent is to address important societal issues.

Conclusion

This paper presents a comprehensive analysis of the security and privacy issues with deep perceptual hashing algorithms, such as the one used in Apple's NeuralHash system for detecting child sexual abuse material (CSAM) on user devices. The key finding is that these hashing techniques are vulnerable to adversarial attacks, where small changes to images can manipulate the hash values to either hide abusive content or frame innocent users. Additionally, the hash values themselves can reveal sensitive information about the data stored on a user's device, posing significant privacy risks.

Based on these findings, the researchers conclude that current deep perceptual hashing is not ready for robust client-side scanning applications and should not be used due to these security and privacy concerns. The paper emphasizes the need for further research and development to address these limitations and create more reliable and privacy-preserving hashing approaches before deploying them for sensitive use cases.

The implications of this research extend beyond the specific context of CSAM detection, as it highlights the broader challenges in balancing user privacy with the deployment of advanced data analysis techniques, especially in the context of client-side scanning. As technology continues to evolve, it will be essential for researchers, policymakers, and the public to engage in thoughtful discussions and collaborations to address these complex issues and ensure the protection of individual rights and liberties.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🤿

Learning to Break Deep Perceptual Hashing: The Use Case NeuralHash

Lukas Struppek, Dominik Hintersdorf, Daniel Neider, Kristian Kersting

Apple recently revealed its deep perceptual hashing system NeuralHash to detect child sexual abuse material (CSAM) on user devices before files are uploaded to its iCloud service. Public criticism quickly arose regarding the protection of user privacy and the system's reliability. In this paper, we present the first comprehensive empirical analysis of deep perceptual hashing based on NeuralHash. Specifically, we show that current deep perceptual hashing may not be robust. An adversary can manipulate the hash values by applying slight changes in images, either induced by gradient-based approaches or simply by performing standard image transformations, forcing or preventing hash collisions. Such attacks permit malicious actors easily to exploit the detection system: from hiding abusive material to framing innocent users, everything is possible. Moreover, using the hash values, inferences can still be made about the data stored on user devices. In our view, based on our results, deep perceptual hashing in its current form is generally not ready for robust client-side scanning and should not be used from a privacy perspective.

7/17/2024

Assessing the Adversarial Security of Perceptual Hashing Algorithms

Jordan Madden, Moxanki Bhavsar, Lhamo Dorje, Xiaohua Li

Perceptual hashing algorithms (PHAs) are utilized extensively for identifying illegal online content. Given their crucial role in sensitive applications, understanding their security strengths and weaknesses is critical. This paper compares three major PHAs deployed widely in practice: PhotoDNA, PDQ, and NeuralHash, and assesses their robustness against three typical attacks: normal image editing attacks, malicious adversarial attacks, and hash inversion attacks. Contrary to prevailing studies, this paper reveals that these PHAs exhibit resilience to black-box adversarial attacks when realistic constraints regarding the distortion and query budget are applied, attributed to the unique property of random hash variations. Moreover, this paper illustrates that original images can be reconstructed from the hash bits, raising significant privacy concerns. By comprehensively exposing their security vulnerabilities, this paper contributes to the ongoing efforts aimed at enhancing the security of PHAs for effective deployment.

6/4/2024

PriPHiT: Privacy-Preserving Hierarchical Training of Deep Neural Networks

Yamin Sepehri, Pedram Pad, Pascal Frossard, L. Andrea Dunbar

The training phase of deep neural networks requires substantial resources and as such is often performed on cloud servers. However, this raises privacy concerns when the training dataset contains sensitive content, e.g., face images. In this work, we propose a method to perform the training phase of a deep learning model on both an edge device and a cloud server that prevents sensitive content being transmitted to the cloud while retaining the desired information. The proposed privacy-preserving method uses adversarial early exits to suppress the sensitive content at the edge and transmits the task-relevant information to the cloud. This approach incorporates noise addition during the training phase to provide a differential privacy guarantee. We extensively test our method on different facial datasets with diverse face attributes using various deep learning architectures, showcasing its outstanding performance. We also demonstrate the effectiveness of privacy preservation through successful defenses against different white-box and deep reconstruction attacks.

8/12/2024

NeuroHash: A Hyperdimensional Neuro-Symbolic Framework for Spatially-Aware Image Hashing and Retrieval

Sanggeon Yun, Ryozo Masukawa, SungHeon Jeong, Mohsen Imani

Customizable image retrieval from large datasets remains a critical challenge, particularly when preserving spatial relationships within images. Traditional hashing methods, primarily based on deep learning, often fail to capture spatial information adequately and lack transparency. In this paper, we introduce NeuroHash, a novel neuro-symbolic framework leveraging Hyperdimensional Computing (HDC) to enable highly customizable, spatially-aware image retrieval. NeuroHash combines pre-trained deep neural network models with HDC-based symbolic models, allowing for flexible manipulation of hash values to support conditional image retrieval. Our method includes a self-supervised context-aware HDC encoder and novel loss terms for optimizing lower-dimensional bipolar hashing using multilinear hyperplanes. We evaluate NeuroHash on two benchmark datasets, demonstrating superior performance compared to state-of-the-art hashing methods, as measured by mAP@5K scores and our newly introduced metric, mAP@5Kr, which assesses spatial alignment. The results highlight NeuroHash's ability to achieve competitive performance while offering significant advantages in flexibility and customization, paving the way for more advanced and versatile image retrieval systems.

5/24/2024