Toward FAIR Semantic Publishing of Research Dataset Metadata in the Open Research Knowledge Graph

Read original: arXiv:2404.08443 - Published 4/15/2024 by Raia Abu Ahmad, Jennifer D'Souza, Matthaus Zloch, Wolfgang Otto, Georg Rehm, Allard Oelen, Stefan Dietze, Soren Auer
Total Score

0

Toward FAIR Semantic Publishing of Research Dataset Metadata in the Open Research Knowledge Graph

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • Presents a framework for publishing research dataset metadata in a standardized, machine-readable format to enable better discovery and integration of research data.
  • Introduces the Open Research Knowledge Graph (ORKG), a platform for semantically publishing and linking research concepts and data.
  • Outlines the design principles and key features of the ORKG-Dataset application, which allows researchers to publish their dataset metadata in a FAIR (Findable, Accessible, Interoperable, Reusable) manner.

Plain English Explanation

The paper discusses a system called the Open Research Knowledge Graph (ORKG) that aims to make it easier for researchers to share and discover research datasets. Traditionally, research data has been difficult to find and use because it is often scattered across different locations and stored in incompatible formats. The ORKG framework provides a standardized way for researchers to publish metadata about their datasets in a machine-readable format.

This allows the datasets to be more easily discovered, accessed, and integrated with other research. The ORKG-Dataset application enables researchers to describe their datasets using a common set of attributes, such as the purpose, methods, and findings. By publishing this metadata in a structured, semantic format, the datasets become more Findable, Accessible, Interoperable, and Reusable (FAIR) for other researchers.

The paper outlines the key design principles and features of the ORKG-Dataset system, demonstrating how it can help improve the discoverability and integration of research data to support more comprehensive and data-driven insights.

Technical Explanation

The paper presents the ORKG-Dataset application, which is part of the broader Open Research Knowledge Graph (ORKG) framework for semantically publishing research concepts and data. The ORKG-Dataset application is designed to enable researchers to publish their dataset metadata in a standardized, machine-readable format that adheres to the FAIR principles of Findability, Accessibility, Interoperability, and Reusability.

The key design principles of ORKG-Dataset include:

  1. Modular and extensible data model: The application uses a flexible, modular data model that can accommodate a wide range of dataset types and metadata attributes.
  2. Semantic representation: Dataset metadata is represented using semantic web technologies, such as RDF and ontologies, to enable machine-readability and interoperability.
  3. Community-driven: The metadata schema and ontologies are developed in collaboration with the research community to ensure they meet the needs of users.
  4. Incentivization: The application provides various incentives and recognition mechanisms to encourage researchers to publish their dataset metadata.

The paper describes the architecture and key features of the ORKG-Dataset application, including the user interface for metadata entry, the underlying knowledge graph data model, and the integration with other ORKG components for dataset discovery and exploration. The authors also present a use case demonstrating how the system can be used to publish and discover dataset metadata in a semantically-enriched manner.

Critical Analysis

The paper presents a well-designed framework for publishing research dataset metadata in a standardized, machine-readable format. The authors have clearly considered the key challenges and design principles required to make dataset metadata more Findable, Accessible, Interoperable, and Reusable (FAIR).

One potential limitation of the work is the reliance on researcher participation and community buy-in to ensure the success of the ORKG-Dataset application. The authors acknowledge that incentivizing researchers to publish their metadata is a critical challenge that will require sustained effort and engagement.

Additionally, the paper does not provide a detailed evaluation of the usability and effectiveness of the ORKG-Dataset application in real-world scenarios. Further user studies and feedback from the research community would be helpful to assess the practical impact and user experience of the system.

Overall, the paper presents a promising approach to improving the discoverability and integration of research datasets through semantic publishing. The ORKG framework and the ORKG-Dataset application have the potential to significantly enhance the data-driven insights available to researchers across various disciplines.

Conclusion

The paper introduces the ORKG-Dataset application, which is part of the broader Open Research Knowledge Graph (ORKG) framework. ORKG-Dataset provides a standardized, machine-readable way for researchers to publish their dataset metadata, making it more Findable, Accessible, Interoperable, and Reusable (FAIR).

By representing dataset metadata using semantic web technologies, the ORKG-Dataset application enables better discovery and integration of research data, supporting more comprehensive and data-driven insights. The paper outlines the key design principles and features of the system, demonstrating its potential to improve the discoverability and reuse of research datasets through semantic publishing.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Toward FAIR Semantic Publishing of Research Dataset Metadata in the Open Research Knowledge Graph
Total Score

0

Toward FAIR Semantic Publishing of Research Dataset Metadata in the Open Research Knowledge Graph

Raia Abu Ahmad, Jennifer D'Souza, Matthaus Zloch, Wolfgang Otto, Georg Rehm, Allard Oelen, Stefan Dietze, Soren Auer

Search engines these days can serve datasets as search results. Datasets get picked up by search technologies based on structured descriptions on their official web pages, informed by metadata ontologies such as the Dataset content type of schema.org. Despite this promotion of the content type dataset as a first-class citizen of search results, a vast proportion of datasets, particularly research datasets, still need to be made discoverable and, therefore, largely remain unused. This is due to the sheer volume of datasets released every day and the inability of metadata to reflect a dataset's content and context accurately. This work seeks to improve this situation for a specific class of datasets, namely research datasets, which are the result of research endeavors and are accompanied by a scholarly publication. We propose the ORKG-Dataset content type, a specialized branch of the Open Research Knowledge Graoh (ORKG) platform, which provides descriptive information and a semantic model for research datasets, integrating them with their accompanying scholarly publications. This work aims to establish a standardized framework for recording and reporting research datasets within the ORKG-Dataset content type. This, in turn, increases research dataset transparency on the web for their improved discoverability and applied use. In this paper, we present a proposal -- the minimum FAIR, comparable, semantic description of research datasets in terms of salient properties of their supporting publication. We design a specific application of the ORKG-Dataset semantic model based on 40 diverse research datasets on scientific information extraction.

Read more

4/15/2024

The Ontoverse: Democratising Access to Knowledge Graph-based Data Through a Cartographic Interface
Total Score

0

The Ontoverse: Democratising Access to Knowledge Graph-based Data Through a Cartographic Interface

Johannes Zimmermann, Dariusz Wiktorek, Thomas Meusburger, Miquel Monge-Dalmau, Antonio Fabregat, Alexander Jarasch, Gunter Schmidt, Jorge S. Reis-Filho, T. Ian Simpson

As the number of scientific publications and preprints is growing exponentially, several attempts have been made to navigate this complex and increasingly detailed landscape. These have almost exclusively taken unsupervised approaches that fail to incorporate domain knowledge and lack the structural organisation required for intuitive interactive human exploration and discovery. Especially in highly interdisciplinary fields, a deep understanding of the connectedness of research works across topics is essential for generating insights. We have developed a unique approach to data navigation that leans on geographical visualisation and uses hierarchically structured domain knowledge to enable end-users to explore knowledge spaces grounded in their desired domains of interest. This can take advantage of existing ontologies, proprietary intelligence schemata, or be directly derived from the underlying data through hierarchical topic modelling. Our approach uses natural language processing techniques to extract named entities from the underlying data and normalise them against relevant domain references and navigational structures. The knowledge is integrated by first calculating similarities between entities based on their shared extracted feature space and then by alignment to the navigational structures. The result is a knowledge graph that allows for full text and semantic graph query and structured topic driven navigation. This allows end-users to identify entities relevant to their needs and access extensive graph analytics. The user interface facilitates graphical interaction with the underlying knowledge graph and mimics a cartographic map to maximise ease of use and widen adoption. We demonstrate an exemplar project using our generalisable and scalable infrastructure for an academic biomedical literature corpus that is grounded against hundreds of different named domain entities.

Read more

8/9/2024

Relationships are Complicated! An Analysis of Relationships Between Datasets on the Web
Total Score

0

Relationships are Complicated! An Analysis of Relationships Between Datasets on the Web

Kate Lin, Tarfah Alrashed, Natasha Noy

The Web today has millions of datasets, and the number of datasets continues to grow at a rapid pace. These datasets are not standalone entities; rather, they are intricately connected through complex relationships. Semantic relationships between datasets provide critical insights for research and decision-making processes. In this paper, we study dataset relationships from the perspective of users who discover, use, and share datasets on the Web: what relationships are important for different tasks? What contextual information might users want to know? We first present a comprehensive taxonomy of relationships between datasets on the Web and map these relationships to user tasks performed during dataset discovery. We develop a series of methods to identify these relationships and compare their performance on a large corpus of datasets generated from Web pages with schema.org markup. We demonstrate that machine-learning based methods that use dataset metadata achieve multi-class classification accuracy of 90%. Finally, we highlight gaps in available semantic markup for datasets and discuss how incorporating comprehensive semantics can facilitate the identification of dataset relationships. By providing a comprehensive overview of dataset relationships at scale, this paper sets a benchmark for future research.

Read more

8/28/2024

🤔

Total Score

0

FAIR evaluation of ten widely used chemical datasets: Lessons learned and recommendations

Marcos Da Silveira, Oona Freudenthal, Louis Deladiennee

This document focuses on databases disseminating data on (hazardous) substances found on the North American and the European (EU) market. The goal is to analyse the FAIRness (Findability, Accessibility, Interoperability and Reusability) of published open data on these substances and to qualitatively evaluate to what extend the selected databases already fulfil the criteria set out in the commission draft regulation on a common data chemicals platform. We implemented two complementary approaches: Manual, and Automatic. The manual approach is based on online questionnaires. These questionnaires provide a structured approach to evaluating FAIRness by guiding users through a series of questions related to the FAIR principles. They are particularly useful for initiating discussions on FAIR implementation within research teams and for identifying areas that require further attention. Automated tools for FAIRness assessment, such as F-UJI and FAIR Checker, are gaining prominence and are continuously under development. Unlike manual tools, automated tools perform a series of tests automatically starting from a dereferenceable URL to the data resource to be evaluated. We analysed ten widely adopted datasets managed in Europe and North America. The highest score from automatic analysis was 54/100. The manual analysis shows that several FAIR metrics were satisfied, but not detectable by automatic tools because there is no metadata, or the format of the information was not a standard one. Thus, it was not interpretable by the tool. We present the details of the analysis and tables summarizing the outcomes, the issues, and the suggestions to address these issues.

Read more

7/23/2024