Ontology Embedding: A Survey of Methods, Applications and Resources

Read original: arXiv:2406.10964 - Published 6/18/2024 by Jiaoyan Chen, Olga Mashkova, Fernando Zhapa-Camacho, Robert Hoehndorf, Yuan He, Ian Horrocks

Ontology Embedding: A Survey of Methods, Applications and Resources

Overview

• Ontology embedding is a technique that represents the concepts and relationships in an ontology as dense, numerical vectors. • This allows for the use of powerful machine learning methods on ontological data, enabling applications like knowledge graph reasoning, question answering, and semantic search. • The paper provides a comprehensive survey of ontology embedding methods, their applications, and available resources.

Plain English Explanation

Ontologies are structured collections of knowledge that define the key concepts in a particular domain and how they are related. For example, an ontology for the field of biology might define entities like "plant," "animal," "cell," and "photosynthesis," and specify how those concepts are connected.

Ontology embedding takes this ontological knowledge and converts it into a format that computers can easily work with - numerical vectors. Each concept in the ontology is represented as a multi-dimensional vector, with the distances between the vectors encoding the semantic relationships between the concepts.

This vector representation allows powerful machine learning techniques to be applied to ontological data. For instance, Towards Ontology-Enhanced Representation Learning with Large Language Models explores how ontology embeddings can be used to enhance the performance of large language models on tasks like question answering.

The survey paper provides an overview of the different methods for generating ontology embeddings, the applications they enable, and the resources (datasets, tools, etc.) available to researchers and practitioners. This gives readers a comprehensive understanding of the state of the art in this rapidly evolving field.

Technical Explanation

The paper first provides background on ontologies and the motivation for embedding them into vector spaces. It then categorizes the various ontology embedding methods into several broad approaches:

Factorization-based Methods: These methods directly factorize the ontology structure (e.g., concept-relation-concept triples) into low-dimensional vector representations.
Translation-based Methods: These methods model ontological relations as translations between concept vectors, similar to the popular TransE knowledge graph embedding approach.
Graph Neural Network-based Methods: These methods use graph neural networks to encode the ontology structure into vector representations.
Attention-based Methods: These methods leverage attention mechanisms to better capture the complex relationships between ontological concepts.

The paper then surveys a wide range of applications enabled by ontology embeddings, such as knowledge graph reasoning, question answering, and semantic search. It also reviews various random walk-based methods for ontology embedding and discusses the tradeoffs between different approaches, such as the ability to preserve lattice structures.

Critical Analysis

The paper provides a comprehensive and well-structured overview of the field of ontology embedding, covering a wide range of methods and applications. However, it does not deeply critique the limitations and potential issues with the current state of the art.

One potential concern is the reliance on the assumption that ontological concepts and relations can be accurately captured in a vector space. In reality, ontologies often contain complex hierarchical and logical structures that may not be easily represented in a Euclidean space. The paper could have discussed the potential challenges and trade-offs of this approach in more detail.

Additionally, the survey focuses primarily on technical methods and applications, but does not address important ethical and societal considerations around the use of ontology embeddings. For example, the potential for bias and misuse of these technologies in high-stakes decision-making processes could be an area for further exploration.

Conclusion

This survey paper provides a comprehensive overview of the field of ontology embedding, covering the key methods, applications, and resources available to researchers and practitioners. By converting ontological knowledge into numerical vector representations, ontology embedding enables the use of powerful machine learning techniques on structured data, opening up new possibilities for knowledge-driven applications.

The paper serves as a valuable reference for anyone interested in understanding the current state of the art in this rapidly evolving field. However, it could be strengthened by a more critical examination of the limitations and potential societal implications of ontology embedding technologies.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Ontology Embedding: A Survey of Methods, Applications and Resources

Jiaoyan Chen, Olga Mashkova, Fernando Zhapa-Camacho, Robert Hoehndorf, Yuan He, Ian Horrocks

Ontologies are widely used for representing domain knowledge and meta data, playing an increasingly important role in Information Systems, the Semantic Web, Bioinformatics and many other domains. However, logical reasoning that ontologies can directly support are quite limited in learning, approximation and prediction. One straightforward solution is to integrate statistical analysis and machine learning. To this end, automatically learning vector representation for knowledge of an ontology i.e., ontology embedding has been widely investigated in recent years. Numerous papers have been published on ontology embedding, but a lack of systematic reviews hinders researchers from gaining a comprehensive understanding of this field. To bridge this gap, we write this survey paper, which first introduces different kinds of semantics of ontologies, and formally defines ontology embedding from the perspectives of both mathematics and machine learning, as well as its property of faithfulness. Based on this, it systematically categorises and analyses a relatively complete set of over 80 papers, according to the ontologies and semantics that they aim at, and their technical solutions including geometric modeling, sequence modeling and graph propagation. This survey also introduces the applications of ontology embedding in ontology engineering, machine learning augmentation and life sciences, presents a new library mOWL, and discusses the challenges and future directions.

6/18/2024

Ontological Relations from Word Embeddings

Mathieu d'Aquin, Emmanuel Nauer

It has been reliably shown that the similarity of word embeddings obtained from popular neural models such as BERT approximates effectively a form of semantic similarity of the meaning of those words. It is therefore natural to wonder if those embeddings contain enough information to be able to connect those meanings through ontological relationships such as the one of subsumption. If so, large knowledge models could be built that are capable of semantically relating terms based on the information encapsulated in word embeddings produced by pre-trained models, with implications not only for ontologies (ontology matching, ontology evolution, etc.) but also on the ability to integrate ontological knowledge in neural models. In this paper, we test how embeddings produced by several pre-trained models can be used to predict relations existing between classes and properties of popular upper-level and general ontologies. We show that even a simple feed-forward architecture on top of those embeddings can achieve promising accuracies, with varying generalisation abilities depending on the input data. To achieve that, we produce a dataset that can be used to further enhance those models, opening new possibilities for applications integrating knowledge from web ontologies.

8/2/2024

Towards Ontology-Enhanced Representation Learning for Large Language Models

Francesco Ronzano, Jay Nanavati

Taking advantage of the widespread use of ontologies to organise and harmonize knowledge across several distinct domains, this paper proposes a novel approach to improve an embedding-Large Language Model (embedding-LLM) of interest by infusing the knowledge formalized by a reference ontology: ontological knowledge infusion aims at boosting the ability of the considered LLM to effectively model the knowledge domain described by the infused ontology. The linguistic information (i.e. concept synonyms and descriptions) and structural information (i.e. is-a relations) formalized by the ontology are utilized to compile a comprehensive set of concept definitions, with the assistance of a powerful generative LLM (i.e. GPT-3.5-turbo). These concept definitions are then employed to fine-tune the target embedding-LLM using a contrastive learning framework. To demonstrate and evaluate the proposed approach, we utilize the biomedical disease ontology MONDO. The results show that embedding-LLMs enhanced by ontological disease knowledge exhibit an improved capability to effectively evaluate the similarity of in-domain sentences from biomedical documents mentioning diseases, without compromising their out-of-domain performance.

6/3/2024

Knowledge Base Embeddings: Semantics and Theoretical Properties

Camille Bourgaux, Ricardo Guimar~aes, Raoul Koudijs, Victor Lacerda, Ana Ozaki

Research on knowledge graph embeddings has recently evolved into knowledge base embeddings, where the goal is not only to map facts into vector spaces but also constrain the models so that they take into account the relevant conceptual knowledge available. This paper examines recent methods that have been proposed to embed knowledge bases in description logic into vector spaces through the lens of their geometric-based semantics. We identify several relevant theoretical properties, which we draw from the literature and sometimes generalize or unify. We then investigate how concrete embedding methods fit in this theoretical framework.

8/12/2024