Representing Knowledge and Querying Data using Double-Functorial Semantics

Read original: arXiv:2403.19884 - Published 4/1/2024 by Michael Lambert, Evan Patterson

📊

Introduction

This section discusses the application of category theory to knowledge representation and databases. The ontology or database schema is represented as a small category, and instance data is a structure-preserving functor from the schema into a category like Set.

Knowledge representation emphasizes rich schemas and expressive languages (categorical structure), while databases focus on data instances and transformations between them. The paper introduces the concept of a "double olog," which is a small double category of relations with specific properties. An instance of a double olog is a structure-preserving double functor into the double category of sets, functions, relations, and implications (Rel).

Compared to previous approaches like functional and relational ologs, double ologs recognize functions and relations as fundamental concepts and grant them equal status. They possess the four elements of a logic fibered over a type theory: objects as types, arrows as terms, proarrows as predicates, and cells as judgments or implications.

Double ologs allow queries on instance data to be formulated internally within the schema, rather than through external mappings. Basic operations of relational algebra (permutations, projections, joins) appear directly as operations in the double category of relations.

The paper aims to explain the features of double ologs through an accessible, example-driven style, while also providing technical definitions for readers familiar with double category theory.

Representing Knowledge

The section discusses how double categories can enhance the expressive power of ologs (ontology logs) beyond functional or relational ologs. It shows how facts can be expressed using commutative diagrams, and how individual constants like people can be included. It demonstrates expressing one-to-many relationships like a Jaffa's past or current service as First Prime to System Lords using relations rather than functions.

The section then explains how tabulators allow creating new types from relations, such as the type of all people who have hosted a symbiote. Local products allow forming conjunctions of propositions, illustrated with an example about whether people's expertise matches their team's mission. The local product identifies people whose skills fit their assigned remit.

Overall, the section highlights how double categories with relations, tabulators, and local products increase the expressiveness of ologs compared to just functional or relational structures alone.

Querying Data

The provided section discusses how various database operations like select, filter, and join can be expressed in the framework of double categories of relations.

Select is performed by taking an extension along projection morphisms to retrieve desired columns from a relation table.

Filter is achieved by taking restrictions, isolating rows that satisfy a given criteria like a specific attribute value.

Inner join of two tables is computed as a pullback, taking the cartesian product of the two tables and matching on the shared column(s) to combine information across relations.

The operations are illustrated with concrete examples and data tables. The key idea is that querying corresponds to composing the basic operations of extension, restriction, and product available in double categories of relations. This provides an internal way to express queries within the schema instead of handling them externally.

Appendix A Background on Double Category Theory

This appendix reviews background on double category theory and proves a new characterization of partial maps in a "double category of relations."

A double category of relations is a double category that is cartesian, an equipment, and satisfies a discreteness condition. It has finite products, and the source-target projection functor is a bifibration, allowing cells to be completed to cartesian or opcartesian cells (restrictions and extensions).

Local products of proarrows can be constructed by restricting products of relations along diagonals. Inner joins match two relations along a common domain.

A partial map is characterized by the condition R^†⊗R ≤ id_B, where R^† is the reverse relation of R. This forces the domain projection d of R to be monic, making the span (d, c) a genuine partial map.

In a unit-pure double category of relations with tabulators, a proarrow r is a partial map if and only if the domain d of the tabulator of r is monic. This generalizes the previous characterization using the reverse relation condition.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

📊

Representing Knowledge and Querying Data using Double-Functorial Semantics

Michael Lambert, Evan Patterson

Category theory offers a mathematical foundation for knowledge representation and database systems. Popular existing approaches model a database instance as a functor into the category of sets and functions, or as a 2-functor into the 2-category of sets, relations, and implications. The functional and relational models are unified by double functors into the double category of sets, functions, relations, and implications. In an accessible, example-driven style, we show that the abstract structure of a 'double category of relations' is a flexible and expressive language in which to represent knowledge, and we show how queries on data in the spirit of Codd's relational algebra are captured by double-functorial semantics.

4/1/2024

🤿

Position: Categorical Deep Learning is an Algebraic Theory of All Architectures

Bruno Gavranovi'c, Paul Lessard, Andrew Dudzik, Tamara von Glehn, Jo~ao G. M. Ara'ujo, Petar Veliv{c}kovi'c

We present our position on the elusive quest for a general-purpose framework for specifying and studying deep learning architectures. Our opinion is that the key attempts made so far lack a coherent bridge between specifying constraints which models must satisfy and specifying their implementations. Focusing on building a such a bridge, we propose to apply category theory -- precisely, the universal algebra of monads valued in a 2-category of parametric maps -- as a single theory elegantly subsuming both of these flavours of neural network design. To defend our position, we show how this theory recovers constraints induced by geometric deep learning, as well as implementations of many architectures drawn from the diverse landscape of neural networks, such as RNNs. We also illustrate how the theory naturally encodes many standard constructs in computer science and automata theory.

6/7/2024

🛠️

Linear Arboreal Categories

Samson Abramsky, Yo`av Montacute, Nihil Shah

Arboreal categories, introduced by Abramsky and Reggio, axiomatise categories with tree-shaped objects. These categories provide a categorical language for formalising behavioural notions such as simulation, bisimulation, and resource-indexing. In this paper, we strengthen the axioms of an arboreal category to exclude `branching' behaviour, obtaining a notion of `linear arboreal category'. We then demonstrate that every arboreal category satisfying a linearisability condition has an associated linear arboreal subcategory related via an adjunction. This identifies the relationship between the pebble-relation comonad, of Montacute and Shah, and the pebbling comonad, of Abramsky, Dawar, and Wang, and generalises it further. As another outcome of this new framework, we obtain a linear variant of the arboreal category for modal logic. By doing so we recover different linear-time equivalences between transition systems as instances of their categorical definitions. We conclude with new preservation and characterisation theorems relating trace inclusion and trace equivalence with different linear fragments of modal logic.

4/1/2024

Categorical semiotics: Foundations for Knowledge Integration

Carlos Leandro

The integration of knowledge extracted from diverse models, whether described by domain experts or generated by machine learning algorithms, has historically been challenged by the absence of a suitable framework for specifying and integrating structures, learning processes, data transformations, and data models or rules. In this work, we extend algebraic specification methods to address these challenges within such a framework. In our work, we tackle the challenging task of developing a comprehensive framework for defining and analyzing deep learning architectures. We believe that previous efforts have fallen short by failing to establish a clear connection between the constraints a model must adhere to and its actual implementation. Our methodology employs graphical structures that resemble Ehresmann's sketches, interpreted within a universe of fuzzy sets. This approach offers a unified theory that elegantly encompasses both deterministic and non-deterministic neural network designs. Furthermore, we highlight how this theory naturally incorporates fundamental concepts from computer science and automata theory. Our extended algebraic specification framework, grounded in graphical structures akin to Ehresmann's sketches, offers a promising solution for integrating knowledge across disparate models and domains. By bridging the gap between domain-specific expertise and machine-generated insights, we pave the way for more comprehensive, collaborative, and effective approaches to knowledge integration and modeling.

4/3/2024