Predicting Drug-Gene Relations via Analogy Tasks with Word Embeddings

Read original: arXiv:2406.00984 - Published 9/6/2024 by Hiroaki Yamagiwa, Ryoma Hashimoto, Kiwamu Arakane, Ken Murakami, Shou Soeda, Momose Oyama, Mariko Okada, Hidetoshi Shimodaira

Predicting Drug-Gene Relations via Analogy Tasks with Word Embeddings

Overview

This paper presents a method for predicting drug-gene relations using word embeddings and analogy tasks.
The authors explore two settings: a global setting that considers all drug-gene pairs, and a pathway-wise setting that focuses on specific biological pathways.
The proposed approach leverages the semantic relationships captured by word embeddings to infer potential drug-gene interactions.

Plain English Explanation

The paper describes a way to predict how drugs might interact with genes using machine learning techniques. The researchers developed a system that looks at the connections between words in a large dataset of drug and gene information. By understanding how the words for drugs and genes are related, the system can make educated guesses about which drugs might affect which genes.

The researchers tested their approach in two different ways. In the global setting, they looked at all possible drug-gene pairs. In the pathway-wise setting, they focused on specific biological pathways, which are the series of steps that genes go through to perform their functions.

The key idea is that if the words for two drugs or two genes are related in a similar way, then those drugs or genes may also be related in some meaningful way, like interacting with each other. By learning these relationships, the system can then use that knowledge to predict new drug-gene interactions that haven't been observed before.

Technical Explanation

The paper proposes a method for predicting drug-gene relations using word embeddings and analogy tasks. In the global setting, the authors consider all possible drug-gene pairs, while in the pathway-wise setting, they focus on specific biological pathways.

The key idea is to leverage the semantic relationships captured by word embeddings to infer potential drug-gene interactions. The authors train word2vec embeddings on a corpus of drug-gene information and then use those embeddings to perform analogy tasks. For example, if the system learns that the relationship between "aspirin" and "ibuprofen" is similar to the relationship between "BRCA1" and "BRCA2", then it can predict that a drug like aspirin might interact with the BRCA1 gene.

The authors evaluate their approach on benchmark datasets and demonstrate its effectiveness in predicting drug-gene relations, outperforming several baselines. The results suggest that the proposed method can be a useful tool for drug discovery and repurposing.

Critical Analysis

The paper presents a novel and promising approach for predicting drug-gene relations using word embeddings and analogy tasks. The authors provide a thorough evaluation of their method and demonstrate its advantages over existing techniques.

One potential limitation of the study is the reliance on the quality and coverage of the underlying data used to train the word embeddings. If the dataset does not contain comprehensive information about drug-gene interactions, the system may not be able to learn all the relevant relationships. Additionally, the authors note that their method is susceptible to bias in the training data, which could lead to inaccurate predictions.

Further research could explore ways to improve the robustness and generalizability of the approach, such as by incorporating additional data sources or developing more sophisticated embedding techniques. It would also be interesting to investigate the interpretability of the learned relationships and how they align with known biological mechanisms.

Conclusion

In this paper, the authors present a method for predicting drug-gene relations using word embeddings and analogy tasks. The proposed approach leverages the semantic relationships captured by word embeddings to infer potential drug-gene interactions, with promising results on benchmark datasets.

The study demonstrates the potential of leveraging natural language processing techniques for drug discovery and repurposing, which could have significant implications for the development of new therapies and the optimization of existing ones. As the field of computational biology continues to evolve, the insights and methodologies presented in this paper may inspire further research and contribute to the advancement of this important area of study.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Predicting Drug-Gene Relations via Analogy Tasks with Word Embeddings

Hiroaki Yamagiwa, Ryoma Hashimoto, Kiwamu Arakane, Ken Murakami, Shou Soeda, Momose Oyama, Mariko Okada, Hidetoshi Shimodaira

Natural language processing (NLP) is utilized in a wide range of fields, where words in text are typically transformed into feature vectors called embeddings. BioConceptVec is a specific example of embeddings tailored for biology, trained on approximately 30 million PubMed abstracts using models such as skip-gram. Generally, word embeddings are known to solve analogy tasks through simple vector arithmetic. For instance, $mathrm{textit{king}} - mathrm{textit{man}} + mathrm{textit{woman}}$ predicts $mathrm{textit{queen}}$. In this study, we demonstrate that BioConceptVec embeddings, along with our own embeddings trained on PubMed abstracts, contain information about drug-gene relations and can predict target genes from a given drug through analogy computations. We also show that categorizing drugs and genes using biological pathways improves performance. Furthermore, we illustrate that vectors derived from known relations in the past can predict unknown future relations in datasets divided by year. Despite the simplicity of implementing analogy tasks as vector additions, our approach demonstrated performance comparable to that of large language models such as GPT-4 in predicting drug-gene relations.

9/6/2024

Ontological Relations from Word Embeddings

Mathieu d'Aquin, Emmanuel Nauer

It has been reliably shown that the similarity of word embeddings obtained from popular neural models such as BERT approximates effectively a form of semantic similarity of the meaning of those words. It is therefore natural to wonder if those embeddings contain enough information to be able to connect those meanings through ontological relationships such as the one of subsumption. If so, large knowledge models could be built that are capable of semantically relating terms based on the information encapsulated in word embeddings produced by pre-trained models, with implications not only for ontologies (ontology matching, ontology evolution, etc.) but also on the ability to integrate ontological knowledge in neural models. In this paper, we test how embeddings produced by several pre-trained models can be used to predict relations existing between classes and properties of popular upper-level and general ontologies. We show that even a simple feed-forward architecture on top of those embeddings can achieve promising accuracies, with varying generalisation abilities depending on the input data. To achieve that, we produce a dataset that can be used to further enhance those models, opening new possibilities for applications integrating knowledge from web ontologies.

8/2/2024

🔮

Research on Adverse Drug Reaction Prediction Model Combining Knowledge Graph Embedding and Deep Learning

Yufeng Li, Wenchao Zhao, Bo Dang, Xu Yan, Weimin Wang, Min Gao, Mingxuan Xiao

In clinical treatment, identifying potential adverse reactions of drugs can help assist doctors in making medication decisions. In response to the problems in previous studies that features are high-dimensional and sparse, independent prediction models need to be constructed for each adverse reaction of drugs, and the prediction accuracy is low, this paper develops an adverse drug reaction prediction model based on knowledge graph embedding and deep learning, which can predict experimental results. Unified prediction of adverse drug reactions covered. Knowledge graph embedding technology can fuse the associated information between drugs and alleviate the shortcomings of high-dimensional sparsity in feature matrices, and the efficient training capabilities of deep learning can improve the prediction accuracy of the model. This article builds an adverse drug reaction knowledge graph based on drug feature data; by analyzing the embedding effect of the knowledge graph under different embedding strategies, the best embedding strategy is selected to obtain sample vectors; and then a convolutional neural network model is constructed to predict adverse reactions. The results show that under the DistMult embedding model and 400-dimensional embedding strategy, the convolutional neural network model has the best prediction effect; the average accuracy, F_1 score, recall rate and area under the curve of repeated experiments are better than the methods reported in the literature. The obtained prediction model has good prediction accuracy and stability, and can provide an effective reference for later safe medication guidance.

7/30/2024

🔎

Description-Based Text Similarity

Shauli Ravfogel, Valentina Pyatkin, Amir DN Cohen, Avshalom Manevich, Yoav Goldberg

Identifying texts with a given semantics is central for many information seeking scenarios. Similarity search over vector embeddings appear to be central to this ability, yet the similarity reflected in current text embeddings is corpus-driven, and is inconsistent and sub-optimal for many use cases. What, then, is a good notion of similarity for effective retrieval of text? We identify the need to search for texts based on abstract descriptions of their content, and the corresponding notion of emph{description based similarity}. We demonstrate the inadequacy of current text embeddings and propose an alternative model that significantly improves when used in standard nearest neighbor search. The model is trained using positive and negative pairs sourced through prompting a LLM, demonstrating how data from LLMs can be used for creating new capabilities not immediately possible using the original model.

7/25/2024