Searching COVID-19 Clinical Research Using Graph Queries: Algorithm Development and Validation

Read original: arXiv:2310.04094 - Published 6/26/2024 by Francesco Invernici, Anna Bernasconi, Stefano Ceri

Searching COVID-19 Clinical Research Using Graph Queries: Algorithm Development and Validation

Overview

Explores using graphical abstracts to help researchers quickly find relevant COVID-19 clinical research papers
Developed a system to automatically generate graphical abstracts for COVID-19 papers in the CORD-19 dataset
Conducted user studies to evaluate the effectiveness of the graphical abstracts in helping researchers search and discover relevant studies

Plain English Explanation

The paper describes a system that can automatically generate visual graphical abstracts for research papers related to COVID-19. The goal is to help researchers quickly identify and find relevant studies by providing a concise visual summary of the paper's key information.

The researchers used the CORD-19 dataset, a large collection of COVID-19 research papers, as the source for their system. They developed an automated process to analyze the papers and generate graphical abstracts that include important details like the study objectives, methods, and findings.

Through user studies, the researchers evaluated how effective these graphical abstracts were in helping researchers search and discover relevant COVID-19 studies. The results suggest that the visual summaries can significantly improve the speed and accuracy of literature searches compared to traditional text-based approaches.

Technical Explanation

The researchers first compiled the CORD-19 dataset, a large collection of over 200,000 research papers related to COVID-19. They then developed an automated system to generate graphical abstracts for these papers.

The system works by extracting key information from each paper, including the study objectives, methods, and findings. This information is then used to create a visual summary in the form of a graphical abstract. The graphical abstracts include elements like icons, charts, and text boxes to convey the paper's core concepts in a concise and easy-to-understand format.

To evaluate the effectiveness of these graphical abstracts, the researchers conducted user studies. Participants were asked to use the graphical abstracts and traditional text-based search to find relevant COVID-19 studies. The results showed that the graphical abstracts helped users discover relevant papers more quickly and accurately compared to text-based search alone.

Critical Analysis

The researchers acknowledge several limitations of their work. First, the automated process for generating the graphical abstracts may not always accurately capture all the nuances and details of the original papers. Additionally, the user studies were relatively small in scale, and more extensive evaluation would be needed to fully understand the system's real-world performance.

Another potential issue is the scalability of the approach. As the COVID-19 literature continues to grow, maintaining and updating the graphical abstracts could become increasingly challenging. The researchers suggest exploring ways to make the system more scalable and efficient.

Despite these limitations, the core idea of using graphical abstracts to facilitate COVID-19 literature search is promising. The positive results from the user studies suggest that this approach could significantly improve researchers' ability to quickly identify and access relevant studies, which could be particularly valuable in fast-moving fields like pandemic research.

Conclusion

This paper presents a novel system for generating graphical abstracts of COVID-19 research papers to help researchers more effectively search and discover relevant studies. The user studies demonstrate the potential benefits of this approach, which could have broader applications in other scientific domains beyond just COVID-19 research.

As the volume of scientific literature continues to grow, innovative approaches like this one may become increasingly important for helping researchers efficiently navigate and synthesize the available knowledge. Further research and development in this area could lead to significant advancements in how we access and understand scientific information.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Searching COVID-19 Clinical Research Using Graph Queries: Algorithm Development and Validation

Francesco Invernici, Anna Bernasconi, Stefano Ceri

Objective: This study aims to consider small graphs of concepts and exploit them for expressing graph searches over existing COVID-19-related literature, leveraging the increasing use of graphs to represent and query scientific knowledge and providing a user-friendly search and exploration experience. Methods: We considered the COVID-19 Open Research Dataset corpus and summarized its content by annotating the publications' abstracts using terms selected from the UMLS and the Ontology of Coronavirus Infectious Disease. Then, we built a co-occurrence network that includes all relevant concepts mentioned in the corpus, establishing connections when their mutual information is relevant. A sophisticated graph query engine was built to allow the identification of the best matches of graph queries on the network. It also supports partial matches and suggests potential query completions using shortest paths. Results: We built a large co-occurrence network, consisting of 128,249 entities and 47,198,965 relationships; the GRAPH-SEARCH interface allows users to explore the network by formulating or adapting graph queries; it produces a bibliography of publications, which are globally ranked; and each publication is further associated with the specific parts of the query that it explains, thereby allowing the user to understand each aspect of the matching. Conclusions: Our approach supports the process of query formulation and evidence search upon a large text corpus; it can be reapplied to any scientific domain where documents corpora and curated ontologies are made available.

6/26/2024

Enhancing Biomedical Knowledge Discovery for Diseases: An End-To-End Open-Source Framework

Christos Theodoropoulos, Andrei Catalin Coman, James Henderson, Marie-Francine Moens

The ever-growing volume of biomedical publications creates a critical need for efficient knowledge discovery. In this context, we introduce an open-source end-to-end framework designed to construct knowledge around specific diseases directly from raw text. To facilitate research in disease-related knowledge discovery, we create two annotated datasets focused on Rett syndrome and Alzheimer's disease, enabling the identification of semantic relations between biomedical entities. Extensive benchmarking explores various ways to represent relations and entity representations, offering insights into optimal modeling strategies for semantic relation detection and highlighting language models' competence in knowledge discovery. We also conduct probing experiments using different layer representations and attention scores to explore transformers' ability to capture semantic relations.

9/9/2024

📉

Applying BioBERT to Extract Germline Gene-Disease Associations for Building a Knowledge Graph from the Biomedical Literature

Armando D. Diaz Gonzalez, Kevin S. Hughes, Songhui Yue, Sean T. Hayes

Published biomedical information has and continues to rapidly increase. The recent advancements in Natural Language Processing (NLP), have generated considerable interest in automating the extraction, normalization, and representation of biomedical knowledge about entities such as genes and diseases. Our study analyzes germline abstracts in the construction of knowledge graphs of the of the immense work that has been done in this area for genes and diseases. This paper presents SimpleGermKG, an automatic knowledge graph construction approach that connects germline genes and diseases. For the extraction of genes and diseases, we employ BioBERT, a pre-trained BERT model on biomedical corpora. We propose an ontology-based and rule-based algorithm to standardize and disambiguate medical terms. For semantic relationships between articles, genes, and diseases, we implemented a part-whole relation approach to connect each entity with its data source and visualize them in a graph-based knowledge representation. Lastly, we discuss the knowledge graph applications, limitations, and challenges to inspire the future research of germline corpora. Our knowledge graph contains 297 genes, 130 diseases, and 46,747 triples. Graph-based visualizations are used to show the results.

4/24/2024

Constructing the CORD-19 Vaccine Dataset

Manisha Singh, Divy Sharma, Alonso Ma, Bridget Tyree, Margaret Mitchell

We introduce new dataset 'CORD-19-Vaccination' to cater to scientists specifically looking into COVID-19 vaccine-related research. This dataset is extracted from CORD-19 dataset [Wang et al., 2020] and augmented with new columns for language detail, author demography, keywords, and topic per paper. Facebook's fastText model is used to identify languages [Joulin et al., 2016]. To establish author demography (author affiliation, lab/institution location, and lab/institution country columns) we processed the JSON file for each paper and then further enhanced using Google's search API to determine country values. 'Yake' was used to extract keywords from the title, abstract, and body of each paper and the LDA (Latent Dirichlet Allocation) algorithm was used to add topic information [Campos et al., 2020, 2018a,b]. To evaluate the dataset, we demonstrate a question-answering task like the one used in the CORD-19 Kaggle challenge [Goldbloom et al., 2022]. For further evaluation, sequential sentence classification was performed on each paper's abstract using the model from Dernoncourt et al. [2016]. We partially hand annotated the training dataset and used a pre-trained BERT-PubMed layer. 'CORD- 19-Vaccination' contains 30k research papers and can be immensely valuable for NLP research such as text mining, information extraction, and question answering, specific to the domain of COVID-19 vaccine research.

7/29/2024