Trustless Audits without Revealing Data or Models

2404.04500

Published 4/9/2024 by Suppakit Waiwitlikhit, Ion Stoica, Yi Sun, Tatsunori Hashimoto, Daniel Kang

Trustless Audits without Revealing Data or Models

Abstract

There is an increasing conflict between business incentives to hide models and data as trade secrets, and the societal need for algorithmic transparency. For example, a rightsholder wishing to know whether their copyrighted works have been used during training must convince the model provider to allow a third party to audit the model and data. Finding a mutually agreeable third party is difficult, and the associated costs often make this approach impractical. In this work, we show that it is possible to simultaneously allow model providers to keep their model weights (but not architecture) and data secret while allowing other parties to trustlessly audit model and data properties. We do this by designing a protocol called ZkAudit in which model providers publish cryptographic commitments of datasets and model weights, alongside a zero-knowledge proof (ZKP) certifying that published commitments are derived from training the model. Model providers can then respond to audit requests by privately computing any function F of the dataset (or model) and releasing the output of F alongside another ZKP certifying the correct execution of F. To enable ZkAudit, we develop new methods of computing ZKPs for SGD on modern neural nets for simple recommender systems and image classification models capable of high accuracies on ImageNet. Empirically, we show it is possible to provide trustless audits of DNNs, including copyright, censorship, and counterfactual audits with little to no loss in accuracy.

Create account to get full access

Overview

This paper presents a new approach called ZkAudit that allows for private audits of machine learning models without revealing the underlying data or model parameters.
ZkAudit leverages the power of zero-knowledge proofs (ZK-SNARKs) to enable trustless audits, where the auditor can verify the correctness of the model without gaining access to sensitive information.
The proposed system aims to address the growing need for transparency and accountability in AI systems while preserving the privacy of the data and model owners.

Plain English Explanation

The paper introduces a new technique called ZkAudit that allows for auditing machine learning models without revealing the sensitive data or model details used to train them. This is an important problem because as AI systems become more widely deployed, there is an increasing need for transparency and accountability to ensure they are behaving ethically and without discrimination. However, the companies and organizations developing these AI models often want to protect their intellectual property and the privacy of their customers' data.

ZkAudit solves this problem by using a cryptographic technique called zero-knowledge proofs (ZK-SNARKs). With ZK-SNARKs, the auditor can verify that the model is behaving correctly without ever seeing the underlying data or model parameters. This is done by having the model owner generate a special kind of "proof" that attests to the correctness of the model, without revealing any of the sensitive information.

The key innovation of ZkAudit is that it enables these trustless audits, where the auditor can be confident in the results without having to fully trust the model owner. This helps build confidence in the safety and fairness of AI systems while respecting the privacy concerns of the companies developing them.

Technical Explanation

The paper first provides background on zero-knowledge proofs (ZK-SNARKs) and how they can be used to enable private computation. ZK-SNARKs allow one party (the prover) to convince another party (the verifier) that a given statement is true, without revealing any additional information beyond the validity of the statement.

The authors then describe the ZkAudit system in detail. ZkAudit allows a model owner to generate a ZK-SNARK proof that attests to the correctness of their machine learning model, without revealing the model parameters or training data. The auditor can then verify this proof to ensure the model is behaving as expected, without gaining access to the sensitive information.

The key technical components of ZkAudit include:

Model Encoding: The model owner encodes their machine learning model as a set of arithmetic circuits that can be efficiently verified using ZK-SNARKs.
Proof Generation: The model owner generates a ZK-SNARK proof that demonstrates the correctness of their model, without revealing any private information.
Proof Verification: The auditor can verify the ZK-SNARK proof to confirm the model is operating correctly, without learning anything about the model or training data.

The authors evaluate the performance and feasibility of ZkAudit through a series of experiments, demonstrating its ability to enable trustless audits of machine learning models.

Critical Analysis

The ZkAudit approach presents an innovative solution to the important problem of auditing AI systems while preserving privacy. By leveraging the power of zero-knowledge proofs, the system enables a high degree of transparency and accountability without compromising the confidentiality of sensitive data or model details.

One potential limitation of the approach is the computational overhead required to generate and verify the ZK-SNARK proofs. The authors acknowledge that this overhead may be a barrier to adoption, particularly for larger and more complex models. However, they argue that as ZK-SNARK technology continues to improve, the practicality of this approach will increase.

Another area for further research is the potential for adversarial attacks against the ZkAudit system. While the authors discuss the security properties of their approach, it would be valuable to explore potential vulnerabilities and ways to mitigate them.

Overall, the ZkAudit system represents an important step forward in the ongoing effort to balance the need for transparency in AI systems with the need to protect sensitive information. As the field of machine learning continues to evolve, approaches like ZkAudit will be crucial for building public trust and ensuring the responsible development of AI technology.

Conclusion

The paper presents a novel approach called ZkAudit that enables private audits of machine learning models without revealing the underlying data or model parameters. By leveraging zero-knowledge proofs, ZkAudit allows auditors to verify the correctness of a model without gaining access to sensitive information, addressing a critical need for transparency and accountability in the AI industry.

The technical details and experimental evaluation demonstrate the feasibility of this approach, paving the way for more trustworthy and responsible development of AI systems. While there are still some challenges to overcome, such as the computational overhead of the ZK-SNARK proofs, the ZkAudit system represents an important step forward in the ongoing effort to balance privacy and transparency in the age of AI.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🏅

Verifiable evaluations of machine learning models using zkSNARKs

Tobin South, Alexander Camuto, Shrey Jain, Shayla Nguyen, Robert Mahari, Christian Paquin, Jason Morton, Alex 'Sandy' Pentland

In a world of increasing closed-source commercial machine learning models, model evaluations from developers must be taken at face value. These benchmark results-whether over task accuracy, bias evaluations, or safety checks-are traditionally impossible to verify by a model end-user without the costly or impossible process of re-performing the benchmark on black-box model outputs. This work presents a method of verifiable model evaluation using model inference through zkSNARKs. The resulting zero-knowledge computational proofs of model outputs over datasets can be packaged into verifiable evaluation attestations showing that models with fixed private weights achieve stated performance or fairness metrics over public inputs. We present a flexible proving system that enables verifiable attestations to be performed on any standard neural network model with varying compute requirements. For the first time, we demonstrate this across a sample of real-world models and highlight key challenges and design solutions. This presents a new transparency paradigm in the verifiable evaluation of private models.

5/24/2024

cs.LG cs.AI cs.CR

📈

Tighter Privacy Auditing of DP-SGD in the Hidden State Threat Model

Tudor Cebere, Aur'elien Bellet, Nicolas Papernot

Machine learning models can be trained with formal privacy guarantees via differentially private optimizers such as DP-SGD. In this work, we study such privacy guarantees when the adversary only accesses the final model, i.e., intermediate model updates are not released. In the existing literature, this hidden state threat model exhibits a significant gap between the lower bound provided by empirical privacy auditing and the theoretical upper bound provided by privacy accounting. To challenge this gap, we propose to audit this threat model with adversaries that craft a gradient sequence to maximize the privacy loss of the final model without accessing intermediate models. We demonstrate experimentally how this approach consistently outperforms prior attempts at auditing the hidden state model. When the crafted gradient is inserted at every optimization step, our results imply that releasing only the final model does not amplify privacy, providing a novel negative result. On the other hand, when the crafted gradient is not inserted at every step, we show strong evidence that a privacy amplification phenomenon emerges in the general non-convex setting (albeit weaker than in convex regimes), suggesting that existing privacy upper bounds can be improved.

5/24/2024

cs.LG cs.CR

📊

Auditing and Generating Synthetic Data with Controllable Trust Trade-offs

Brian Belgodere, Pierre Dognin, Adam Ivankay, Igor Melnyk, Youssef Mroueh, Aleksandra Mojsilovic, Jiri Navratil, Apoorva Nitsure, Inkit Padhi, Mattia Rigotti, Jerret Ross, Yair Schiff, Radhika Vedpathak, Richard A. Young

Real-world data often exhibits bias, imbalance, and privacy risks. Synthetic datasets have emerged to address these issues. This paradigm relies on generative AI models to generate unbiased, privacy-preserving data while maintaining fidelity to the original data. However, assessing the trustworthiness of synthetic datasets and models is a critical challenge. We introduce a holistic auditing framework that comprehensively evaluates synthetic datasets and AI models. It focuses on preventing bias and discrimination, ensures fidelity to the source data, assesses utility, robustness, and privacy preservation. We demonstrate the framework's effectiveness by auditing various generative models across diverse use cases like education, healthcare, banking, and human resources, spanning different data modalities such as tabular, time-series, vision, and natural language. This holistic assessment is essential for compliance with regulatory safeguards. We introduce a trustworthiness index to rank synthetic datasets based on their safeguards trade-offs. Furthermore, we present a trustworthiness-driven model selection and cross-validation process during training, exemplified with TrustFormers across various data types. This approach allows for controllable trustworthiness trade-offs in synthetic data creation. Our auditing framework fosters collaboration among stakeholders, including data scientists, governance experts, internal reviewers, external certifiers, and regulators. This transparent reporting should become a standard practice to prevent bias, discrimination, and privacy violations, ensuring compliance with policies and providing accountability, safety, and performance guarantees.

6/11/2024

cs.LG cs.AI stat.ML

👀

Nearly Tight Black-Box Auditing of Differentially Private Machine Learning

Meenatchi Sundaram Muthu Selva Annamalai, Emiliano De Cristofaro

This paper presents a nearly tight audit of the Differentially Private Stochastic Gradient Descent (DP-SGD) algorithm in the black-box model. Our auditing procedure empirically estimates the privacy leakage from DP-SGD using membership inference attacks; unlike prior work, the estimates are appreciably close to the theoretical DP bounds. The main intuition is to craft worst-case initial model parameters, as DP-SGD's privacy analysis is agnostic to the choice of the initial model parameters. For models trained with theoretical $varepsilon=10.0$ on MNIST and CIFAR-10, our auditing procedure yields empirical estimates of $7.21$ and $6.95$, respectively, on 1,000-record samples and $6.48$ and $4.96$ on the full datasets. By contrast, previous work achieved tight audits only in stronger (i.e., less realistic) white-box models that allow the adversary to access the model's inner parameters and insert arbitrary gradients. Our auditing procedure can be used to detect bugs and DP violations more easily and offers valuable insight into how the privacy analysis of DP-SGD can be further improved.

5/24/2024

cs.CR cs.LG