Distance-Preserving Generative Modeling of Spatial Transcriptomics

Read original: arXiv:2408.00911 - Published 8/6/2024 by Wenbin Zhou, Jin-Hong Du

Distance-Preserving Generative Modeling of Spatial Transcriptomics

Overview

This paper proposes a new generative modeling approach for spatial transcriptomics data.
The method aims to preserve the underlying spatial geometry and structure of the data.
It uses a variational autoencoder (VAE) architecture to learn a low-dimensional representation of the spatial gene expression patterns.
The key innovation is the inclusion of a "spatial loss" term that encourages the latent space to reflect the original spatial relationships.

Plain English Explanation

The paper describes a way to model spatial gene expression data using a machine learning technique called a variational autoencoder (VAE). Spatial gene expression refers to measurements of which genes are active in different regions of a biological sample, like a tissue slice. This information is important for understanding how genes are regulated and coordinated across the spatial structure of an organism.

The main idea is to learn a compact, low-dimensional representation of the spatial gene expression patterns that preserves the underlying geometry and relationships between different regions of the sample. This is achieved by including a special "spatial loss" term in the training of the VAE model, which encourages the latent space (the low-dimensional representation) to reflect the original spatial organization of the data.

By preserving the spatial structure, the model can then be used to generate new synthetic spatial gene expression patterns that maintain the realistic spatial relationships between different regions. This could be useful for tasks like simulating biological processes, filling in missing data, or exploring hypothetical scenarios.

Technical Explanation

The paper introduces a new generative modeling approach for spatial transcriptomics data called Distance-Preserving Variational Autoencoder (DPVAE). The core of the model is a variational autoencoder (VAE) architecture, which learns a low-dimensional latent representation of the input spatial gene expression patterns.

The key innovation is the addition of a "spatial loss" term to the standard VAE objective. This spatial loss encourages the latent space to reflect the underlying spatial relationships between different regions of the biological sample. Specifically, the spatial loss aims to preserve the pairwise distances between spatial locations in the original data.

The authors demonstrate the effectiveness of DPVAE on several spatial transcriptomics datasets, showing that it can generate realistic synthetic spatial gene expression patterns that maintain the original spatial structure. They also show that the learned latent representations can be used for downstream tasks like spatial pattern imputation and clustering.

Critical Analysis

The DPVAE approach provides a promising way to model the complex spatial structure inherent in spatial transcriptomics data. By explicitly incorporating spatial relationships into the generative model, the method is able to produce more realistic and meaningful synthetic data compared to standard VAE models.

However, the paper does not explore some potential limitations and caveats of the approach. For example, the spatial loss function used is based on simple pairwise distances, which may not fully capture higher-order spatial structures and dependencies. Additionally, the performance of DPVAE likely depends on the quality and coverage of the original spatial transcriptomics data, which can be challenging to obtain.

Further research could investigate more sophisticated spatial loss functions, as well as ways to combine DPVAE with other spatial gene expression prediction methods to leverage multimodal data sources. Evaluating the model's ability to generalize to new biological systems or experimental conditions would also be an important area for future work.

Conclusion

This paper presents a novel generative modeling approach called DPVAE that is specifically designed to preserve the underlying spatial structure of spatial transcriptomics data. By incorporating a spatial loss term into the VAE training objective, the method is able to learn low-dimensional latent representations that maintain realistic spatial relationships.

The ability to generate synthetic spatial gene expression data that reflects the true spatial organization of biological systems could have important applications in areas like computational biology, tissue engineering, and drug discovery. While the current approach has some limitations, the general principles of DPVAE represent an exciting step forward in the field of spatial transcriptomics modeling and analysis.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Distance-Preserving Generative Modeling of Spatial Transcriptomics

Wenbin Zhou, Jin-Hong Du

Spatial transcriptomics data is invaluable for understanding the spatial organization of gene expression in tissues. There have been consistent efforts in studying how to effectively utilize the associated spatial information for refining gene expression modeling. We introduce a class of distance-preserving generative models for spatial transcriptomics, which utilizes the provided spatial information to regularize the learned representation space of gene expressions to have a similar pair-wise distance structure. This helps the latent space to capture meaningful encodings of genes in spatial proximity. We carry out theoretical analysis over a tractable loss function for this purpose and formalize the overall learning objective as a regularized evidence lower bound. Our framework grants compatibility with any variational-inference-based generative models for gene expression modeling. Empirically, we validate our proposed method on the mouse brain tissues Visium dataset and observe improved performance with variational autoencoders and scVI used as backbone models.

8/6/2024

stEnTrans: Transformer-based deep learning for spatial transcriptomics enhancement

Shuailin Xue, Fangfang Zhu, Changmiao Wang, Wenwen Min

The spatial location of cells within tissues and organs is crucial for the manifestation of their specific functions.Spatial transcriptomics technology enables comprehensive measurement of the gene expression patterns in tissues while retaining spatial information. However, current popular spatial transcriptomics techniques either have shallow sequencing depth or low resolution. We present stEnTrans, a deep learning method based on Transformer architecture that provides comprehensive predictions for gene expression in unmeasured areas or unexpectedly lost areas and enhances gene expression in original and inputed spots. Utilizing a self-supervised learning approach, stEnTrans establishes proxy tasks on gene expression profile without requiring additional data, mining intrinsic features of the tissues as supervisory information. We evaluate stEnTrans on six datasets and the results indicate superior performance in enhancing spots resolution and predicting gene expression in unmeasured areas compared to other deep learning and traditional interpolation methods. Additionally, Our method also can help the discovery of spatial patterns in Spatial Transcriptomics and enrich to more biologically significant pathways. Our source code is available at https://github.com/shuailinxue/stEnTrans.

7/12/2024

Enhancing Gene Expression Prediction from Histology Images with Spatial Transcriptomics Completion

Gabriel Mejia, Daniela Ruiz, Paula C'ardenas, Leonardo Manrique, Daniela Vega, Pablo Arbel'aez

Spatial Transcriptomics is a novel technology that aligns histology images with spatially resolved gene expression profiles. Although groundbreaking, it struggles with gene capture yielding high corruption in acquired data. Given potential applications, recent efforts have focused on predicting transcriptomic profiles solely from histology images. However, differences in databases, preprocessing techniques, and training hyperparameters hinder a fair comparison between methods. To address these challenges, we present a systematically curated and processed database collected from 26 public sources, representing an 8.6-fold increase compared to previous works. Additionally, we propose a state-of-the-art transformer based completion technique for inferring missing gene expression, which significantly boosts the performance of transcriptomic profile predictions across all datasets. Altogether, our contributions constitute the most comprehensive benchmark of gene expression prediction from histology images to date and a stepping stone for future research on spatial transcriptomics.

7/19/2024

Multimodal contrastive learning for spatial gene expression prediction using histology images

Wenwen Min, Zhiceng Shi, Jun Zhang, Jun Wan, Changmiao Wang

In recent years, the advent of spatial transcriptomics (ST) technology has unlocked unprecedented opportunities for delving into the complexities of gene expression patterns within intricate biological systems. Despite its transformative potential, the prohibitive cost of ST technology remains a significant barrier to its widespread adoption in large-scale studies. An alternative, more cost-effective strategy involves employing artificial intelligence to predict gene expression levels using readily accessible whole-slide images (WSIs) stained with Hematoxylin and Eosin (H&E). However, existing methods have yet to fully capitalize on multimodal information provided by H&E images and ST data with spatial location. In this paper, we propose textbf{mclSTExp}, a multimodal contrastive learning with Transformer and Densenet-121 encoder for Spatial Transcriptomics Expression prediction. We conceptualize each spot as a word, integrating its intrinsic features with spatial context through the self-attention mechanism of a Transformer encoder. This integration is further enriched by incorporating image features via contrastive learning, thereby enhancing the predictive capability of our model. Our extensive evaluation of textbf{mclSTExp} on two breast cancer datasets and a skin squamous cell carcinoma dataset demonstrates its superior performance in predicting spatial gene expression. Moreover, mclSTExp has shown promise in interpreting cancer-specific overexpressed genes, elucidating immune-related genes, and identifying specialized spatial domains annotated by pathologists. Our source code is available at https://github.com/shizhiceng/mclSTExp.

7/12/2024