Private Attribute Inference from Images with Vision-Language Models

2404.10618

Published 4/17/2024 by Batuhan Tomekc{c}e, Mark Vero, Robin Staab, Martin Vechev

🤯

Abstract

As large language models (LLMs) become ubiquitous in our daily tasks and digital interactions, associated privacy risks are increasingly in focus. While LLM privacy research has primarily focused on the leakage of model training data, it has recently been shown that the increase in models' capabilities has enabled LLMs to make accurate privacy-infringing inferences from previously unseen texts. With the rise of multimodal vision-language models (VLMs), capable of understanding both images and text, a pertinent question is whether such results transfer to the previously unexplored domain of benign images posted online. To investigate the risks associated with the image reasoning capabilities of newly emerging VLMs, we compile an image dataset with human-annotated labels of the image owner's personal attributes. In order to understand the additional privacy risk posed by VLMs beyond traditional human attribute recognition, our dataset consists of images where the inferable private attributes do not stem from direct depictions of humans. On this dataset, we evaluate the inferential capabilities of 7 state-of-the-art VLMs, finding that they can infer various personal attributes at up to 77.6% accuracy. Concerningly, we observe that accuracy scales with the general capabilities of the models, implying that future models can be misused as stronger adversaries, establishing an imperative for the development of adequate defenses.

Create account to get full access

Overview

As large language models (LLMs) and multimodal vision-language models (VLMs) become increasingly prevalent, concerns about their potential to infringe on privacy are growing.
While previous research has focused on the leakage of model training data, this paper explores whether VLMs can make accurate inferences about personal attributes from benign images posted online.
The researchers compiled a dataset of images with human-annotated labels of the image owner's personal attributes, focusing on attributes that do not directly depict the individuals.
They evaluated the inferential capabilities of 7 state-of-the-art VLMs on this dataset, finding that they can infer various personal attributes with up to 77.6% accuracy.
The study suggests that the privacy risks posed by VLMs are significant and likely to increase as the models become more capable, highlighting the need for adequate defenses.

Plain English Explanation

As large language models (LLMs) and multimodal vision-language models (VLMs) become more common in our daily digital interactions, concerns about their potential to infringe on our privacy are growing. While previous research has mainly focused on the risk of these models leaking the data used to train them, this new study looks at a different kind of privacy risk.

The researchers wanted to see if VLMs, which can understand both images and text, could use benign images (like those we might post online) to accurately infer personal information about the image owner. To test this, they created a dataset of images with labels that describe the owner's personal attributes, but in a way that doesn't directly show the person themselves. For example, the image might show a person's home or belongings, and the labels could indicate the owner's income level, political views, or other personal details.

The researchers then evaluated 7 state-of-the-art VLMs to see how well they could use these images to guess the owner's personal attributes. Concerningly, they found that the models could make these inferences with up to 77.6% accuracy. Even more troubling, they discovered that the models' accuracy increased as the models' overall capabilities improved, suggesting that future, more advanced VLMs could pose an even greater threat to our privacy.

This study highlights the need for developing effective ways to protect our privacy as these powerful AI models become more common and capable. It's a wake-up call that the privacy risks of AI go beyond just the data used to train the models - they can also extend to the inferences these models can make from the information we share online, even if it seems harmless.

Technical Explanation

The researchers compiled a dataset of images with human-annotated labels of the image owner's personal attributes, focusing on attributes that do not stem from direct depictions of the individuals. This dataset was designed to investigate the privacy-infringing inferential capabilities of newly emerging multimodal vision-language models (VLMs), which are capable of understanding both images and text.

On this dataset, the researchers evaluated the performance of 7 state-of-the-art VLMs, including DALL-E, CLIP, and VilBERT. They found that these models could infer various personal attributes, such as income level, political views, and personality traits, with up to 77.6% accuracy.

Critically, the researchers observed that the models' accuracy scaled with their general capabilities, suggesting that future, more advanced VLMs could become even stronger adversaries in terms of privacy-infringing inferences. This finding establishes an imperative for the development of adequate defenses against such threats, as the privacy risks posed by VLMs are likely to increase as the models continue to improve.

Critical Analysis

The researchers acknowledge several limitations and areas for further research in their paper. For example, they note that their dataset is relatively small and may not fully capture the diversity of online images and their associated personal attributes. Additionally, the study focuses on a limited set of VLMs, and it's possible that other models may exhibit different inferential capabilities.

Moreover, the paper does not delve into the specific mechanisms by which the VLMs are able to make these privacy-infringing inferences. Understanding the underlying processes could provide valuable insights for developing effective countermeasures. The researchers also do not explore potential mitigations or defenses against such privacy risks, leaving an important gap for future work.

Another relevant study has shown that VLMs can exhibit harmful biases in their perception and reasoning, which could exacerbate the privacy concerns raised in this paper. Addressing these biases may be a crucial step in mitigating the privacy risks posed by VLMs.

Overall, while this paper provides a valuable contribution in highlighting the privacy risks associated with emerging VLMs, further research is needed to fully understand the scope and nature of the problem, as well as to develop effective defenses to protect individuals' privacy in the face of these advanced AI systems.

Conclusion

This study reveals a concerning new privacy risk posed by the increasing capabilities of multimodal vision-language models (VLMs). The researchers demonstrate that these models can accurately infer various personal attributes of image owners from seemingly benign online images, with accuracy rates up to 77.6%. Crucially, they find that the models' inferential capabilities scale with their overall performance, suggesting that future, more advanced VLMs may pose an even greater threat to individual privacy.

These findings underscore the need for heightened awareness and the development of robust defenses against the privacy-infringing potential of VLMs. As these powerful AI systems become more prevalent in our daily digital lives, it is essential that researchers, policymakers, and the public work together to address the privacy implications and ensure that technological progress does not come at the unacceptable cost of eroding our fundamental rights to privacy and personal autonomy.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🤯

Beyond Memorization: Violating Privacy Via Inference with Large Language Models

Robin Staab, Mark Vero, Mislav Balunovi'c, Martin Vechev

Current privacy research on large language models (LLMs) primarily focuses on the issue of extracting memorized training data. At the same time, models' inference capabilities have increased drastically. This raises the key question of whether current LLMs could violate individuals' privacy by inferring personal attributes from text given at inference time. In this work, we present the first comprehensive study on the capabilities of pretrained LLMs to infer personal attributes from text. We construct a dataset consisting of real Reddit profiles, and show that current LLMs can infer a wide range of personal attributes (e.g., location, income, sex), achieving up to $85%$ top-1 and $95%$ top-3 accuracy at a fraction of the cost ($100times$) and time ($240times$) required by humans. As people increasingly interact with LLM-powered chatbots across all aspects of life, we also explore the emerging threat of privacy-invasive chatbots trying to extract personal information through seemingly benign questions. Finally, we show that common mitigations, i.e., text anonymization and model alignment, are currently ineffective at protecting user privacy against LLM inference. Our findings highlight that current LLMs can infer personal data at a previously unattainable scale. In the absence of working defenses, we advocate for a broader discussion around LLM privacy implications beyond memorization, striving for a wider privacy protection.

5/7/2024

cs.AI cs.LG

💬

Privacy-Aware Visual Language Models

Laurens Samson, Nimrod Barazani, Sennay Ghebreab, Yuki M. Asano

This paper aims to advance our understanding of how Visual Language Models (VLMs) handle privacy-sensitive information, a crucial concern as these technologies become integral to everyday life. To this end, we introduce a new benchmark PrivBench, which contains images from 8 sensitive categories such as passports, or fingerprints. We evaluate 10 state-of-the-art VLMs on this benchmark and observe a generally limited understanding of privacy, highlighting a significant area for model improvement. Based on this we introduce PrivTune, a new instruction-tuning dataset aimed at equipping VLMs with knowledge about visual privacy. By tuning two pretrained VLMs, TinyLLaVa and MiniGPT-v2, on this small dataset, we achieve strong gains in their ability to recognize sensitive content, outperforming even GPT4-V. At the same time, we show that privacy-tuning only minimally affects the VLMs performance on standard benchmarks such as VQA. Overall, this paper lays out a crucial challenge for making VLMs effective in handling real-world data safely and provides a simple recipe that takes the first step towards building privacy-aware VLMs.

5/28/2024

cs.CV cs.CL

Uncovering Bias in Large Vision-Language Models at Scale with Counterfactuals

Phillip Howard, Kathleen C. Fraser, Anahita Bhiwandiwalla, Svetlana Kiritchenko

With the advent of Large Language Models (LLMs) possessing increasingly impressive capabilities, a number of Large Vision-Language Models (LVLMs) have been proposed to augment LLMs with visual inputs. Such models condition generated text on both an input image and a text prompt, enabling a variety of use cases such as visual question answering and multimodal chat. While prior studies have examined the social biases contained in text generated by LLMs, this topic has been relatively unexplored in LVLMs. Examining social biases in LVLMs is particularly challenging due to the confounding contributions of bias induced by information contained across the text and visual modalities. To address this challenging problem, we conduct a large-scale study of text generated by different LVLMs under counterfactual changes to input images. Specifically, we present LVLMs with identical open-ended text prompts while conditioning on images from different counterfactual sets, where each set contains images which are largely identical in their depiction of a common subject (e.g., a doctor), but vary only in terms of intersectional social attributes (e.g., race and gender). We comprehensively evaluate the text produced by different models under this counterfactual generation setting at scale, producing over 57 million responses from popular LVLMs. Our multi-dimensional analysis reveals that social attributes such as race, gender, and physical characteristics depicted in input images can significantly influence the generation of toxic content, competency-associated words, harmful stereotypes, and numerical ratings of depicted individuals. We additionally explore the relationship between social bias in LVLMs and their corresponding LLMs, as well as inference-time strategies to mitigate bias.

5/31/2024

cs.CV

Uncovering Bias in Large Vision-Language Models with Counterfactuals

Phillip Howard, Anahita Bhiwandiwalla, Kathleen C. Fraser, Svetlana Kiritchenko

With the advent of Large Language Models (LLMs) possessing increasingly impressive capabilities, a number of Large Vision-Language Models (LVLMs) have been proposed to augment LLMs with visual inputs. Such models condition generated text on both an input image and a text prompt, enabling a variety of use cases such as visual question answering and multimodal chat. While prior studies have examined the social biases contained in text generated by LLMs, this topic has been relatively unexplored in LVLMs. Examining social biases in LVLMs is particularly challenging due to the confounding contributions of bias induced by information contained across the text and visual modalities. To address this challenging problem, we conduct a large-scale study of text generated by different LVLMs under counterfactual changes to input images. Specifically, we present LVLMs with identical open-ended text prompts while conditioning on images from different counterfactual sets, where each set contains images which are largely identical in their depiction of a common subject (e.g., a doctor), but vary only in terms of intersectional social attributes (e.g., race and gender). We comprehensively evaluate the text produced by different LVLMs under this counterfactual generation setting and find that social attributes such as race, gender, and physical characteristics depicted in input images can significantly influence toxicity and the generation of competency-associated words.

6/11/2024

cs.CV cs.AI