Projectivity revisited

Read original: arXiv:2207.00625 - Published 8/21/2024 by Felix Weitkamper

🛸

Overview

The behavior of statistical relational representations across differently sized domains has become an important area of research.
Projectivity, where marginal probabilities are independent of the domain size, is a key property that has emerged.
This paper extends the notion of projectivity from families of distributions to functors taking structured data, making it applicable to a wider range of applications.

Plain English Explanation

The paper explores how the way statistical models represent relationships between data can change as the size of the dataset changes. A key concept called projectivity has emerged, which means that the probability calculations made by the model don't depend on the total size of the dataset.

The researchers wanted to expand this idea of projectivity beyond just simple distributions to more complex, structured datasets. By doing this, they could apply the benefits of projectivity to a wider range of real-world applications that use structured data, like databases.

The paper shows how to adapt the key mathematical results about projective distributions to this new, more general setting. It also reveals an interesting connection between projectivity and modeling data on infinitely large domains. Finally, the researchers introduce an even stronger version of projectivity, called σ-projectivity, which allows models to be used in different ways while still maintaining the size-independence property.

Technical Explanation

The paper extends the concept of projectivity from families of probability distributions indexed by domain size to more general functors taking structured data from a database. This makes projectivity applicable to a wider range of applications that use rich, relational data as input.

The researchers transfer key known results on projective families of distributions to this new setting. This includes a characterization of projective fragments in different statistical relational formalisms, as well as a general representation theorem for projective families of distributions.

Furthermore, the paper proves a correspondence between projectivity and distributions on countably infinite domains. This connection is used to unify and generalize earlier work on statistical relational representations in infinite domains.

Finally, the paper introduces a further strengthening of projectivity called σ-projectivity. This allows the use of the same representation in different modes while still retaining the size-independence property of projectivity.

Critical Analysis

The paper makes a significant theoretical contribution by extending the notion of projectivity to more general structured data representations beyond simple probability distributions. This expands the applicability of projectivity to a wider range of real-world domains.

However, the focus is primarily on the formal mathematical characterization of projectivity, with less emphasis on empirical validation or practical applications. Future work could explore how the theoretical properties of projectivity manifest in actual modeling tasks and datasets.

Additionally, the connection to infinite domains is an interesting theoretical result, but its practical implications are not fully explored. More work is needed to understand how these insights could inform the design of statistical relational models for large-scale, open-ended domains.

Conclusion

This paper advances the theoretical understanding of statistical relational representations by generalizing the concept of projectivity beyond simple probability distributions. This expands the potential applications of projectivity to a wider range of structured data-driven models and opens up new research directions at the intersection of statistical learning and database theory.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🛸

Projectivity revisited

Felix Weitkamper

The behaviour of statistical relational representations across differently sized domains has become a focal area of research from both a modelling and a complexity viewpoint.Recently, projectivity of a family of distributions emerged as a key property, ensuring that marginal probabilities are independent of the domain size. However, the formalisation used currently assumes that the domain is characterised only by its size. This contribution extends the notion of projectivity from families of distributions indexed by domain size to functors taking extensional data from a database. This makes projectivity available for the large range of applications taking structured input. We transfer key known results on projective families of distributions to the new setting. This includes a characterisation of projective fragments in different statistical relational formalisms as well as a general representation theorem for projective families of distributions. Furthermore, we prove a correspondence between projectivity and distributions on countably infinite domains, which we use to unify and generalise earlier work on statistical relational representations in infinite domains. Finally, we use the extended notion of projectivity to define a further strengthening, which we call $sigma$-projectivity, and which allows the use of the same representation in different modes while retaining projectivity.

8/21/2024

🤯

The generalised distribution semantics and projective families of distributions

Felix Weitkamper

We generalise the distribution semantics underpinning probabilistic logic programming by distilling its essential concept, the separation of a free random component and a deterministic part. This abstracts the core ideas beyond logic programming as such to encompass frameworks from probabilistic databases, probabilistic finite model theory and discrete lifted Bayesian networks. To demonstrate the usefulness of such a general approach, we completely characterise the projective families of distributions representable in the generalised distribution semantics and we demonstrate both that large classes of interesting projective families cannot be represented in a generalised distribution semantics and that already a very limited fragment of logic programming (acyclic determinate logic programs) in the determinsitic part suffices to represent all those projective families that are representable in the generalised distribution semantics at all.

5/17/2024

🛸

Probabilities of the third type: Statistical Relational Learning and Reasoning with Relative Frequencies

Felix Weitkamper

Dependencies on the relative frequency of a state in the domain are common when modelling probabilistic dependencies on relational data. For instance, the likelihood of a school closure during an epidemic might depend on the proportion of infected pupils exceeding a threshold. Often, rather than depending on discrete thresholds, dependencies are continuous: for instance, the likelihood of any one mosquito bite transmitting an illness depends on the proportion of carrier mosquitoes. Current approaches usually only consider probabilities over possible worlds rather than over domain elements themselves. An exception are the recently introduced lifted Bayesian networks for conditional probability logic, which express discrete dependencies on probabilistic data. We introduce functional lifted Bayesian networks, a formalism that explicitly incorporates continuous dependencies on relative frequencies into statistical relational artificial intelligence, and compare and contrast them with lifted Bayesian networks for conditional probability logic. Incorporating relative frequencies is not only beneficial to modelling; it also provides a more rigorous approach to learning problems where training and test or application domains have different sizes. To this end, we provide a representation of the asymptotic probability distributions induced by functional lifted Bayesian networks on domains of increasing sizes. Since that representation has well-understood scaling behaviour across domain sizes, it can be used to estimate parameters for a large domain consistently from randomly sampled subpopulations. Furthermore, we show that in parametric families of FLBN, convergence is uniform in the parameters, which ensures a meaningful dependence of the asymptotic probabilities on the parameters of the model.

8/21/2024

📉

Which exceptional low-dimensional projections of a Gaussian point cloud can be found in polynomial time?

Andrea Montanari, Kangjie Zhou

Given $d$-dimensional standard Gaussian vectors $boldsymbol{x}_1,dots, boldsymbol{x}_n$, we consider the set of all empirical distributions of its $m$-dimensional projections, for $m$ a fixed constant. Diaconis and Freedman (1984) proved that, if $n/dto infty$, all such distributions converge to the standard Gaussian distribution. In contrast, we study the proportional asymptotics, whereby $n,dto infty$ with $n/dto alpha in (0, infty)$. In this case, the projection of the data points along a typical random subspace is again Gaussian, but the set $mathscr{F}_{m,alpha}$ of all probability distributions that are asymptotically feasible as $m$-dimensional projections contains non-Gaussian distributions corresponding to exceptional subspaces. Non-rigorous methods from statistical physics yield an indirect characterization of $mathscr{F}_{m,alpha}$ in terms of a generalized Parisi formula. Motivated by the goal of putting this formula on a rigorous basis, and to understand whether these projections can be found efficiently, we study the subset $mathscr{F}^{rm alg}_{m,alpha}subseteq mathscr{F}_{m,alpha}$ of distributions that can be realized by a class of iterative algorithms. We prove that this set is characterized by a certain stochastic optimal control problem, and obtain a dual characterization of this problem in terms of a variational principle that extends Parisi's formula. As a byproduct, we obtain computationally achievable values for a class of random optimization problems including `generalized spherical perceptron' models.

6/6/2024