HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis

Read original: arXiv:2406.16192 - Published 6/26/2024 by Guillaume Jaume, Paul Doucet, Andrew H. Song, Ming Y. Lu, Cristina Almagro-P'erez, Sophia J. Wagner, Anurag J. Vaidya, Richard J. Chen, Drew F. K. Williamson, Ahrong Kim and 1 other
Total Score

0

HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • This paper introduces HEST-1k, a dataset for spatial transcriptomics and histology image analysis.
  • The dataset contains gene expression data and histology images from over 1,000 samples across multiple human tissues.
  • The authors aim to enable research on integrating spatial gene expression and histology data for applications in biology and medicine.

Plain English Explanation

The HEST-1k dataset provides researchers with a valuable tool for studying the connections between gene expression and tissue structure. Spatial transcriptomics is a technique that allows scientists to measure the activity of genes in specific locations within a tissue sample. By pairing this gene expression data with high-resolution histology images, HEST-1k enables researchers to explore how the physical organization of cells and tissues relates to their underlying molecular profiles.

This type of integrated analysis could lead to important insights in fields like computational pathology and regenerative medicine. For example, researchers might be able to identify unique gene expression signatures associated with different disease states or tissue repair processes. The dataset could also support the development of AI models that can automatically analyze and interpret complex histology images in the context of underlying molecular data.

Overall, the HEST-1k dataset represents an important resource for advancing our understanding of the relationships between tissue structure and function at the genomic level.

Technical Explanation

The HEST-1k dataset contains spatial transcriptomics and histology data from over 1,000 tissue samples across multiple human organs, including the brain, kidney, liver, and lung. The spatial transcriptomics data was generated using the 10x Genomics Visium platform, which enables the measurement of gene expression levels at specific spatial coordinates within a tissue section. These spatial gene expression profiles were then paired with high-resolution histology images of the same tissue samples, captured using standard microscopy techniques.

The dataset is structured to support a variety of research applications, including the development of computational methods for integrating spatial gene expression and histology data, as well as the identification of tissue-specific gene expression signatures and their association with morphological features. The authors have also provided a set of baseline models and evaluation metrics to facilitate benchmarking and comparison of new algorithms.

Critical Analysis

The HEST-1k dataset represents a valuable resource for the research community, but it also has some limitations that should be considered. The dataset is currently focused on a limited set of human tissues, and it may not be representative of the full diversity of spatial gene expression and histology patterns across the body. Additionally, the spatial resolution of the gene expression data is relatively coarse, with each measurement corresponding to a tissue region of approximately 50-100 cells.

Future work could explore ways to further improve the spatial and molecular resolution of the dataset, potentially through the integration of emerging single-cell sequencing technologies or higher-magnification histology imaging. Additionally, the dataset could be expanded to include more diverse tissue types, disease states, and experimental conditions to better capture the full complexity of the human body.

Conclusion

The HEST-1k dataset provides a robust platform for studying the relationships between spatial gene expression and tissue morphology, with the potential to drive progress in a wide range of biological and medical applications. By integrating these rich datasets, researchers can gain new insights into the molecular mechanisms underlying tissue structure and function, paving the way for more targeted and effective therapies for a variety of diseases.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis
Total Score

0

HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis

Guillaume Jaume, Paul Doucet, Andrew H. Song, Ming Y. Lu, Cristina Almagro-P'erez, Sophia J. Wagner, Anurag J. Vaidya, Richard J. Chen, Drew F. K. Williamson, Ahrong Kim, Faisal Mahmood

Spatial transcriptomics (ST) enables interrogating the molecular composition of tissue with ever-increasing resolution, depth, and sensitivity. However, costs, rapidly evolving technology, and lack of standards have constrained computational methods in ST to narrow tasks and small cohorts. In addition, the underlying tissue morphology as reflected by H&E-stained whole slide images (WSIs) encodes rich information often overlooked in ST studies. Here, we introduce HEST-1k, a collection of 1,108 spatial transcriptomic profiles, each linked to a WSI and metadata. HEST-1k was assembled using HEST-Library from 131 public and internal cohorts encompassing 25 organs, two species (Homo Sapiens and Mus Musculus), and 320 cancer samples from 25 cancer types. HEST-1k processing enabled the identification of 1.5 million expression--morphology pairs and 60 million nuclei. HEST-1k is tested on three use cases: (1) benchmarking foundation models for histopathology (HEST-Benchmark), (2) biomarker identification, and (3) multimodal representation learning. HEST-1k, HEST-Library, and HEST-Benchmark can be freely accessed via https://github.com/mahmoodlab/hest.

Read more

6/26/2024

STimage-1K4M: A histopathology image-gene expression dataset for spatial transcriptomics
Total Score

0

STimage-1K4M: A histopathology image-gene expression dataset for spatial transcriptomics

Jiawen Chen, Muqing Zhou, Wenrong Wu, Jinwei Zhang, Yun Li, Didong Li

Recent advances in multi-modal algorithms have driven and been driven by the increasing availability of large image-text datasets, leading to significant strides in various fields, including computational pathology. However, in most existing medical image-text datasets, the text typically provides high-level summaries that may not sufficiently describe sub-tile regions within a large pathology image. For example, an image might cover an extensive tissue area containing cancerous and healthy regions, but the accompanying text might only specify that this image is a cancer slide, lacking the nuanced details needed for in-depth analysis. In this study, we introduce STimage-1K4M, a novel dataset designed to bridge this gap by providing genomic features for sub-tile images. STimage-1K4M contains 1,149 images derived from spatial transcriptomics data, which captures gene expression information at the level of individual spatial spots within a pathology image. Specifically, each image in the dataset is broken down into smaller sub-image tiles, with each tile paired with 15,000-30,000 dimensional gene expressions. With 4,293,195 pairs of sub-tile images and gene expressions, STimage-1K4M offers unprecedented granularity, paving the way for a wide range of advanced research in multi-modal data analysis an innovative applications in computational pathology, and beyond.

Read more

6/21/2024

High-Resolution Spatial Transcriptomics from Histology Images using HisToSGE
Total Score

0

High-Resolution Spatial Transcriptomics from Histology Images using HisToSGE

Zhiceng Shi, Shuailin Xue, Fangfang Zhu, Wenwen Min

Spatial transcriptomics (ST) is a groundbreaking genomic technology that enables spatial localization analysis of gene expression within tissue sections. However, it is significantly limited by high costs and sparse spatial resolution. An alternative, more cost-effective strategy is to use deep learning methods to predict high-density gene expression profiles from histological images. However, existing methods struggle to capture rich image features effectively or rely on low-dimensional positional coordinates, making it difficult to accurately predict high-resolution gene expression profiles. To address these limitations, we developed HisToSGE, a method that employs a Pathology Image Large Model (PILM) to extract rich image features from histological images and utilizes a feature learning module to robustly generate high-resolution gene expression profiles. We evaluated HisToSGE on four ST datasets, comparing its performance with five state-of-the-art baseline methods. The results demonstrate that HisToSGE excels in generating high-resolution gene expression profiles and performing downstream tasks such as spatial domain identification. All code and public datasets used in this paper are available at https://github.com/wenwenmin/HisToSGE and https://zenodo.org/records/12792163.

Read more

7/31/2024

HistoSPACE: Histology-Inspired Spatial Transcriptome Prediction And Characterization Engine
Total Score

0

HistoSPACE: Histology-Inspired Spatial Transcriptome Prediction And Characterization Engine

Shivam Kumar, Samrat Chatterjee

Spatial transcriptomics (ST) enables the visualization of gene expression within the context of tissue morphology. This emerging discipline has the potential to serve as a foundation for developing tools to design precision medicines. However, due to the higher costs and expertise required for such experiments, its translation into a regular clinical practice might be challenging. Despite the implementation of modern deep learning to enhance information obtained from histological images using AI, efforts have been constrained by limitations in the diversity of information. In this paper, we developed a model, HistoSPACE that explore the diversity of histological images available with ST data to extract molecular insights from tissue image. Our proposed study built an image encoder derived from universal image autoencoder. This image encoder was connected to convolution blocks to built the final model. It was further fine tuned with the help of ST-Data. This model is notably lightweight in compared to traditional histological models. Our developed model demonstrates significant efficiency compared to contemporary algorithms, revealing a correlation of 0.56 in leave-one-out cross-validation. Finally, its robustness was validated through an independent dataset, showing a well matched preditction with predefined disease pathology.

Read more

8/9/2024