STimage-1K4M: A histopathology image-gene expression dataset for spatial transcriptomics

Read original: arXiv:2406.06393 - Published 6/21/2024 by Jiawen Chen, Muqing Zhou, Wenrong Wu, Jinwei Zhang, Yun Li, Didong Li
Total Score

0

STimage-1K4M: A histopathology image-gene expression dataset for spatial transcriptomics

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • This paper presents a new dataset called STimage-1K4M, which combines histopathology images and gene expression data for spatial transcriptomics research.
  • The dataset is derived from 1,000 tissue samples from 400 patients with various cancer types, providing a rich resource for studying the relationship between tissue morphology and underlying molecular profiles.
  • The authors demonstrate the utility of this dataset through several case studies, showcasing its potential to advance spatial biology and precision medicine.

Plain English Explanation

The paper introduces a new dataset called STimage-1K4M, which combines high-resolution microscope images of tissue samples with information about the gene expression patterns in those tissues. This type of dataset is valuable for researchers studying spatial transcriptomics, which is the field of understanding how the activity of genes is organized within the physical structure of tissues.

The STimage-1K4M dataset includes 1,000 tissue samples from 400 patients with various types of cancer. By having both the detailed images and the gene expression data, researchers can explore how the visual appearance of the tissue is related to the underlying molecular biology. This could lead to new insights into how different diseases manifest in the structure and function of tissues, which could in turn inform the development of better diagnostic tools and treatments.

Technical Explanation

The STimage-1K4M dataset combines high-resolution histopathology images and spatially resolved gene expression data from 1,000 tissue samples across 400 cancer patients. The authors curated this dataset to enable research in spatial transcriptomics, which seeks to understand how the spatial organization of cells and tissues relates to their molecular profiles.

The dataset includes tissue samples from a variety of cancer types, including breast, prostate, lung, and colorectal cancer. For each sample, the authors provide a high-resolution histopathology image as well as gene expression data obtained using spatial transcriptomics techniques like Slide-seq and 10x Genomics Visium. This allows researchers to study the connections between tissue morphology and underlying molecular profiles.

The authors demonstrate the utility of the STimage-1K4M dataset through several case studies, including analyzing spatial gene expression patterns in tumor microenvironments and predicting gene expression from histology images using cross-modal deep learning models.

Critical Analysis

The STimage-1K4M dataset represents a valuable resource for spatial transcriptomics research, but the authors acknowledge some limitations. The tissue samples are primarily from cancer patients, which may limit the generalizability to non-cancer tissues. Additionally, the gene expression data is based on spatial transcriptomics techniques that have inherent resolution and sensitivity constraints.

Further research is needed to fully understand the relationship between tissue morphology and molecular profiles, and to develop more robust computational methods for integrating these multi-modal data sources. Nonetheless, this dataset provides an important foundation for advancing the field of spatial biology and its applications in precision medicine.

Conclusion

The STimage-1K4M dataset combines high-resolution histopathology images and spatial gene expression data, creating a powerful resource for researchers studying the connections between tissue structure and molecular function. By making this dataset publicly available, the authors have opened up new avenues for advancements in spatial transcriptomics, with potential implications for early disease detection, personalized treatment, and a deeper understanding of the complex interplay between cells and their microenvironment.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

STimage-1K4M: A histopathology image-gene expression dataset for spatial transcriptomics
Total Score

0

STimage-1K4M: A histopathology image-gene expression dataset for spatial transcriptomics

Jiawen Chen, Muqing Zhou, Wenrong Wu, Jinwei Zhang, Yun Li, Didong Li

Recent advances in multi-modal algorithms have driven and been driven by the increasing availability of large image-text datasets, leading to significant strides in various fields, including computational pathology. However, in most existing medical image-text datasets, the text typically provides high-level summaries that may not sufficiently describe sub-tile regions within a large pathology image. For example, an image might cover an extensive tissue area containing cancerous and healthy regions, but the accompanying text might only specify that this image is a cancer slide, lacking the nuanced details needed for in-depth analysis. In this study, we introduce STimage-1K4M, a novel dataset designed to bridge this gap by providing genomic features for sub-tile images. STimage-1K4M contains 1,149 images derived from spatial transcriptomics data, which captures gene expression information at the level of individual spatial spots within a pathology image. Specifically, each image in the dataset is broken down into smaller sub-image tiles, with each tile paired with 15,000-30,000 dimensional gene expressions. With 4,293,195 pairs of sub-tile images and gene expressions, STimage-1K4M offers unprecedented granularity, paving the way for a wide range of advanced research in multi-modal data analysis an innovative applications in computational pathology, and beyond.

Read more

6/21/2024

HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis
Total Score

0

HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis

Guillaume Jaume, Paul Doucet, Andrew H. Song, Ming Y. Lu, Cristina Almagro-P'erez, Sophia J. Wagner, Anurag J. Vaidya, Richard J. Chen, Drew F. K. Williamson, Ahrong Kim, Faisal Mahmood

Spatial transcriptomics (ST) enables interrogating the molecular composition of tissue with ever-increasing resolution, depth, and sensitivity. However, costs, rapidly evolving technology, and lack of standards have constrained computational methods in ST to narrow tasks and small cohorts. In addition, the underlying tissue morphology as reflected by H&E-stained whole slide images (WSIs) encodes rich information often overlooked in ST studies. Here, we introduce HEST-1k, a collection of 1,108 spatial transcriptomic profiles, each linked to a WSI and metadata. HEST-1k was assembled using HEST-Library from 131 public and internal cohorts encompassing 25 organs, two species (Homo Sapiens and Mus Musculus), and 320 cancer samples from 25 cancer types. HEST-1k processing enabled the identification of 1.5 million expression--morphology pairs and 60 million nuclei. HEST-1k is tested on three use cases: (1) benchmarking foundation models for histopathology (HEST-Benchmark), (2) biomarker identification, and (3) multimodal representation learning. HEST-1k, HEST-Library, and HEST-Benchmark can be freely accessed via https://github.com/mahmoodlab/hest.

Read more

6/26/2024

Multimodal contrastive learning for spatial gene expression prediction using histology images
Total Score

0

Multimodal contrastive learning for spatial gene expression prediction using histology images

Wenwen Min, Zhiceng Shi, Jun Zhang, Jun Wan, Changmiao Wang

In recent years, the advent of spatial transcriptomics (ST) technology has unlocked unprecedented opportunities for delving into the complexities of gene expression patterns within intricate biological systems. Despite its transformative potential, the prohibitive cost of ST technology remains a significant barrier to its widespread adoption in large-scale studies. An alternative, more cost-effective strategy involves employing artificial intelligence to predict gene expression levels using readily accessible whole-slide images (WSIs) stained with Hematoxylin and Eosin (H&E). However, existing methods have yet to fully capitalize on multimodal information provided by H&E images and ST data with spatial location. In this paper, we propose textbf{mclSTExp}, a multimodal contrastive learning with Transformer and Densenet-121 encoder for Spatial Transcriptomics Expression prediction. We conceptualize each spot as a word, integrating its intrinsic features with spatial context through the self-attention mechanism of a Transformer encoder. This integration is further enriched by incorporating image features via contrastive learning, thereby enhancing the predictive capability of our model. Our extensive evaluation of textbf{mclSTExp} on two breast cancer datasets and a skin squamous cell carcinoma dataset demonstrates its superior performance in predicting spatial gene expression. Moreover, mclSTExp has shown promise in interpreting cancer-specific overexpressed genes, elucidating immune-related genes, and identifying specialized spatial domains annotated by pathologists. Our source code is available at https://github.com/shizhiceng/mclSTExp.

Read more

7/12/2024

High-Resolution Spatial Transcriptomics from Histology Images using HisToSGE
Total Score

0

High-Resolution Spatial Transcriptomics from Histology Images using HisToSGE

Zhiceng Shi, Shuailin Xue, Fangfang Zhu, Wenwen Min

Spatial transcriptomics (ST) is a groundbreaking genomic technology that enables spatial localization analysis of gene expression within tissue sections. However, it is significantly limited by high costs and sparse spatial resolution. An alternative, more cost-effective strategy is to use deep learning methods to predict high-density gene expression profiles from histological images. However, existing methods struggle to capture rich image features effectively or rely on low-dimensional positional coordinates, making it difficult to accurately predict high-resolution gene expression profiles. To address these limitations, we developed HisToSGE, a method that employs a Pathology Image Large Model (PILM) to extract rich image features from histological images and utilizes a feature learning module to robustly generate high-resolution gene expression profiles. We evaluated HisToSGE on four ST datasets, comparing its performance with five state-of-the-art baseline methods. The results demonstrate that HisToSGE excels in generating high-resolution gene expression profiles and performing downstream tasks such as spatial domain identification. All code and public datasets used in this paper are available at https://github.com/wenwenmin/HisToSGE and https://zenodo.org/records/12792163.

Read more

7/31/2024