Expected Shapley-Like Scores of Boolean Functions: Complexity and Applications to Probabilistic Databases

Read original: arXiv:2401.06493 - Published 4/17/2024 by Pratik Karmakar, Mikael Monet, Pierre Senellart, St'ephane Bressan

📉

Overview

Discusses the complexity of computing the expected Shapley-like scores of Boolean functions, with applications to probabilistic databases.
Explores the relationship between Shapley-like scores and expected values, and the computational challenges involved.
Proposes efficient algorithms for computing these scores, with implications for areas like provenance in probabilistic databases.

Plain English Explanation

The paper explores a mathematical concept called the Shapley value, which is used to measure the importance or contribution of individual elements in a complex system. The authors focus on applying this concept to Boolean functions, which are mathematical models that take in one or more inputs and produce a single output.

The key challenge is that computing the expected Shapley value of a Boolean function can be computationally very difficult, especially as the number of input variables increases. The paper shows that this problem is inherently complex and hard to solve efficiently in many cases.

However, the authors also discover an interesting connection between Shapley values and expected values - in certain situations, the Shapley value of a Boolean function is actually equal to its expected value. This insight allows them to develop more efficient algorithms for computing Shapley values in these cases.

The practical applications of this research are in the area of probabilistic databases, where Shapley values can be used to understand the "provenance" or lineage of data - i.e., how different pieces of information contributed to a final result. By making it easier to compute these Shapley values, the authors' work can help improve our understanding of data provenance in probabilistic databases.

Technical Explanation

The paper explores the complexity of computing the expected Shapley-like scores of Boolean functions, with a focus on applications in the context of probabilistic databases.

The authors show that computing the expected Shapley-like scores of a Boolean function is #P-hard in general, meaning it is an inherently difficult computational problem that is unlikely to have efficient solutions. However, they also identify an interesting connection between Shapley-like scores and expected values - in certain cases, the expected Shapley-like score of a Boolean function is equal to its expected value.

Leveraging this connection, the authors develop efficient algorithms for computing the expected Shapley-like scores of Boolean functions in these special cases. They demonstrate the practical relevance of their work by showing how these scores can be used to understand data provenance in probabilistic databases.

Critical Analysis

The paper provides a thorough theoretical analysis of the computational complexity of computing expected Shapley-like scores of Boolean functions. The authors carefully characterize the settings in which these scores can be computed efficiently, as well as the inherent hardness of the problem in the general case.

One potential limitation of the research is that the practical applications are primarily focused on probabilistic databases, which may limit the broader relevance of the results. It would be interesting to see if the insights and algorithms developed in this paper could be applied to other domains where Shapley-like values are important, such as in machine learning model interpretability.

Additionally, the paper does not explore the sensitivity or robustness of the Shapley-like scores to modelling assumptions or data perturbations. In many real-world applications, data and models are noisy or uncertain, so understanding the stability and reliability of these scores would be an important area for further research.

Conclusion

This paper makes significant theoretical and algorithmic contributions to the problem of computing expected Shapley-like scores of Boolean functions. By establishing the computational complexity of this problem and identifying special cases where efficient solutions exist, the authors lay the groundwork for more practical applications of Shapley-like values, particularly in the context of probabilistic databases.

The insights and techniques developed in this paper could have broader implications for fields that rely on Shapley-like values, such as machine learning model interpretation and game-theoretic analysis. Further research exploring the robustness and generalizability of these methods would be a valuable next step.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

📉

Expected Shapley-Like Scores of Boolean Functions: Complexity and Applications to Probabilistic Databases

Pratik Karmakar, Mikael Monet, Pierre Senellart, St'ephane Bressan

Shapley values, originating in game theory and increasingly prominent in explainable AI, have been proposed to assess the contribution of facts in query answering over databases, along with other similar power indices such as Banzhaf values. In this work we adapt these Shapley-like scores to probabilistic settings, the objective being to compute their expected value. We show that the computations of expected Shapley values and of the expected values of Boolean functions are interreducible in polynomial time, thus obtaining the same tractability landscape. We investigate the specific tractable case where Boolean functions are represented as deterministic decomposable circuits, designing a polynomial-time algorithm for this setting. We present applications to probabilistic databases through database provenance, and an effective implementation of this algorithm within the ProvSQL system, which experimentally validates its feasibility over a standard benchmark.

4/17/2024

🚀

Shapley Value Computation in Ontology-Mediated Query Answering

Meghyn Bienvenu, Diego Figueira, Pierre Lafourcade

The Shapley value, originally introduced in cooperative game theory for wealth distribution, has found use in KR and databases for the purpose of assigning scores to formulas and database tuples based upon their contribution to obtaining a query result or inconsistency. In the present paper, we explore the use of Shapley values in ontology-mediated query answering (OMQA) and present a detailed complexity analysis of Shapley value computation (SVC) in the OMQA setting. In particular, we establish a PF/#P-hard dichotomy for SVC for ontology-mediated queries (T,q) composed of an ontology T formulated in the description logic ELHI_bot and a connected constant-free homomorphism-closed query q. We further show that the #P-hardness side of the dichotomy can be strengthened to cover possibly disconnected queries with constants. Our results exploit recently discovered connections between SVC and probabilistic query evaluation and allow us to generalize existing results on probabilistic OMQA.

7/30/2024

Explaining a probabilistic prediction on the simplex with Shapley compositions

Paul-Gauthier No'e, Miquel Perell'o-Nieto, Jean-Franc{c}ois Bonastre, Peter Flach

Originating in game theory, Shapley values are widely used for explaining a machine learning model's prediction by quantifying the contribution of each feature's value to the prediction. This requires a scalar prediction as in binary classification, whereas a multiclass probabilistic prediction is a discrete probability distribution, living on a multidimensional simplex. In such a multiclass setting the Shapley values are typically computed separately on each class in a one-vs-rest manner, ignoring the compositional nature of the output distribution. In this paper, we introduce Shapley compositions as a well-founded way to properly explain a multiclass probabilistic prediction, using the Aitchison geometry from compositional data analysis. We prove that the Shapley composition is the unique quantity satisfying linearity, symmetry and efficiency on the Aitchison simplex, extending the corresponding axiomatic properties of the standard Shapley value. We demonstrate this proper multiclass treatment in a range of scenarios.

8/6/2024

🏷️

DU-Shapley: A Shapley Value Proxy for Efficient Dataset Valuation

Felipe Garrido-Lucero, Benjamin Heymann, Maxime Vono, Patrick Loiseau, Vianney Perchet

We consider the dataset valuation problem, that is, the problem of quantifying the incremental gain, to some relevant pre-defined utility of a machine learning task, of aggregating an individual dataset to others. The Shapley value is a natural tool to perform dataset valuation due to its formal axiomatic justification, which can be combined with Monte Carlo integration to overcome the computational tractability challenges. Such generic approximation methods, however, remain expensive in some cases. In this paper, we exploit the knowledge about the structure of the dataset valuation problem to devise more efficient Shapley value estimators. We propose a novel approximation, referred to as discrete uniform Shapley, which is expressed as an expectation under a discrete uniform distribution with support of reasonable size. We justify the relevancy of the proposed framework via asymptotic and non-asymptotic theoretical guarantees and illustrate its benefits via an extensive set of numerical experiments.

6/19/2024