MCAD: Multi-modal Conditioned Adversarial Diffusion Model for High-Quality PET Image Reconstruction

2406.13150

Published 6/21/2024 by Jiaqi Cui, Xinyi Zeng, Pinxian Zeng, Bo Liu, Xi Wu, Jiliu Zhou, Yan Wang

📈

Abstract

Radiation hazards associated with standard-dose positron emission tomography (SPET) images remain a concern, whereas the quality of low-dose PET (LPET) images fails to meet clinical requirements. Therefore, there is great interest in reconstructing SPET images from LPET images. However, prior studies focus solely on image data, neglecting vital complementary information from other modalities, e.g., patients' clinical tabular, resulting in compromised reconstruction with limited diagnostic utility. Moreover, they often overlook the semantic consistency between real SPET and reconstructed images, leading to distorted semantic contexts. To tackle these problems, we propose a novel Multi-modal Conditioned Adversarial Diffusion model (MCAD) to reconstruct SPET images from multi-modal inputs, including LPET images and clinical tabular. Specifically, our MCAD incorporates a Multi-modal conditional Encoder (Mc-Encoder) to extract multi-modal features, followed by a conditional diffusion process to blend noise with multi-modal features and gradually map blended features to the target SPET images. To balance multi-modal inputs, the Mc-Encoder embeds Optimal Multi-modal Transport co-Attention (OMTA) to narrow the heterogeneity gap between image and tabular while capturing their interactions, providing sufficient guidance for reconstruction. In addition, to mitigate semantic distortions, we introduce the Multi-Modal Masked Text Reconstruction (M3TRec), which leverages semantic knowledge extracted from denoised PET images to restore the masked clinical tabular, thereby compelling the network to maintain accurate semantics during reconstruction. To expedite the diffusion process, we further introduce an adversarial diffusive network with a reduced number of diffusion steps. Experiments show that our method achieves the state-of-the-art performance both qualitatively and quantitatively.

Create account to get full access

Overview

Positron emission tomography (PET) is a medical imaging technique that can provide valuable insights for clinical diagnosis and treatment. However, there are concerns about the radiation hazards associated with standard-dose PET (SPET) imaging, and the image quality of low-dose PET (LPET) often fails to meet clinical requirements.
To address these issues, researchers have explored ways to reconstruct SPET images from LPET images. However, prior studies have focused solely on image data, neglecting vital complementary information from other modalities, such as patients' clinical tabular data.
This can lead to compromised reconstruction with limited diagnostic utility, as well as distorted semantic contexts between the reconstructed and real SPET images.

Plain English Explanation

The paper discusses a novel approach to reconstructing high-quality PET images from low-quality PET images and other patient data. PET is a medical imaging technique that can provide important information for diagnosing and treating health conditions. However, the high-quality PET scans that are most useful for doctors come with a risk of exposing patients to too much radiation. Lower-dose PET scans have lower image quality and may not be as helpful for doctors.

Researchers have tried to find a way to take the low-quality PET scans and reconstruct them into higher-quality images that can be used for diagnosis and treatment. But previous efforts have focused only on the PET image data itself, ignoring other important information about the patient, like their medical history and other test results. This can lead to reconstructed images that don't fully capture the patient's condition and may not be as useful for the doctor.

The paper proposes a new approach that uses both the low-quality PET images and the patient's other medical data to reconstruct high-quality PET images. This helps ensure the reconstructed images accurately reflect the patient's true condition and can be used effectively by doctors.

Technical Explanation

To address the shortcomings of prior approaches, the researchers propose a novel Multi-modal Conditioned Adversarial Diffusion model (MCAD) to reconstruct SPET images from multi-modal inputs, including LPET images and clinical tabular data.

The key components of their MCAD model are:

Multi-modal conditional Encoder (Mc-Encoder): This module extracts features from both the LPET images and the clinical tabular data, using an Optimal Multi-modal Transport co-Attention (OMTA) mechanism to bridge the gap between the heterogeneous data sources and capture their interactions.
Conditional Diffusion Process: The Mc-Encoder's multi-modal features are blended with noise and gradually mapped to the target SPET images through a conditional diffusion process.
Multi-Modal Masked Text Reconstruction (M3TRec): This component leverages the semantic knowledge extracted from the denoised PET images to restore the masked clinical tabular data, ensuring the reconstruction maintains accurate semantics.
Adversarial Diffusive Network: An adversarial network is introduced to expedite the diffusion process and reduce the number of steps required.

The researchers' experiments show that their MCAD model achieves state-of-the-art performance in reconstructing SPET images from multi-modal inputs, both qualitatively and quantitatively.

Critical Analysis

The researchers have addressed an important challenge in medical imaging by developing a novel approach to reconstruct high-quality PET images from low-quality inputs and complementary patient data. Their use of multi-modal data and techniques like OMTA and [M3TRec] to bridge the gap between heterogeneous data sources and maintain semantic consistency is a promising direction.

However, the paper does not discuss the potential limitations or caveats of their approach. For example, the performance and generalization of the model may be dependent on the specific dataset and clinical settings used in the experiments. Additionally, the computational complexity and training time of the MCAD model could be a concern, especially for real-time clinical applications.

Further research could explore the robustness of the MCAD model to noise, missing data, or variations in the input data distribution. Dose-aware diffusion models could also be integrated to better account for the radiation dose implications during the reconstruction process.

Conclusion

The proposed Multi-modal Conditioned Adversarial Diffusion (MCAD) model represents a significant advancement in the field of medical image reconstruction. By leveraging both PET image data and complementary clinical information, the MCAD model can reconstruct high-quality PET images from low-quality inputs, while maintaining semantic consistency and addressing concerns about radiation hazards.

This research has the potential to improve the diagnostic utility of PET imaging, leading to more accurate diagnoses and better-informed treatment decisions for patients. The innovative techniques used in the MCAD model, such as multi-modal feature extraction and semantic consistency preservation, could also have broader applications in other medical imaging and multi-modal data processing tasks.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

2.5D Multi-view Averaging Diffusion Model for 3D Medical Image Translation: Application to Low-count PET Reconstruction with CT-less Attenuation Correction

Tianqi Chen, Jun Hou, Yinchi Zhou, Huidong Xie, Xiongchao Chen, Qiong Liu, Xueqi Guo, Menghua Xia, James S. Duncan, Chi Liu, Bo Zhou

Positron Emission Tomography (PET) is an important clinical imaging tool but inevitably introduces radiation hazards to patients and healthcare providers. Reducing the tracer injection dose and eliminating the CT acquisition for attenuation correction can reduce the overall radiation dose, but often results in PET with high noise and bias. Thus, it is desirable to develop 3D methods to translate the non-attenuation-corrected low-dose PET (NAC-LDPET) into attenuation-corrected standard-dose PET (AC-SDPET). Recently, diffusion models have emerged as a new state-of-the-art deep learning method for image-to-image translation, better than traditional CNN-based methods. However, due to the high computation cost and memory burden, it is largely limited to 2D applications. To address these challenges, we developed a novel 2.5D Multi-view Averaging Diffusion Model (MADM) for 3D image-to-image translation with application on NAC-LDPET to AC-SDPET translation. Specifically, MADM employs separate diffusion models for axial, coronal, and sagittal views, whose outputs are averaged in each sampling step to ensure the 3D generation quality from multiple views. To accelerate the 3D sampling process, we also proposed a strategy to use the CNN-based 3D generation as a prior for the diffusion model. Our experimental results on human patient studies suggested that MADM can generate high-quality 3D translation images, outperforming previous CNN-based and Diffusion-based baseline methods.

6/18/2024

cs.CV cs.AI eess.IV

🖼️

Two-Phase Multi-Dose-Level PET Image Reconstruction with Dose Level Awareness

Yuchen Fei, Yanmei Luo, Yan Wang, Jiaqi Cui, Yuanyuan Xu, Jiliu Zhou, Dinggang Shen

To obtain high-quality positron emission tomography (PET) while minimizing radiation exposure, a range of methods have been designed to reconstruct standard-dose PET (SPET) from corresponding low-dose PET (LPET) images. However, most current methods merely learn the mapping between single-dose-level LPET and SPET images, but omit the dose disparity of LPET images in clinical scenarios. In this paper, to reconstruct high-quality SPET images from multi-dose-level LPET images, we design a novel two-phase multi-dose-level PET reconstruction algorithm with dose level awareness, containing a pre-training phase and a SPET prediction phase. Specifically, the pre-training phase is devised to explore both fine-grained discriminative features and effective semantic representation. The SPET prediction phase adopts a coarse prediction network utilizing pre-learned dose level prior to generate preliminary result, and a refinement network to precisely preserve the details. Experiments on MICCAI 2022 Ultra-low Dose PET Imaging Challenge Dataset have demonstrated the superiority of our method.

4/11/2024

eess.IV cs.CV

Similarity-aware Syncretic Latent Diffusion Model for Medical Image Translation with Representation Learning

Tingyi Lin, Pengju Lyu, Jie Zhang, Yuqing Wang, Cheng Wang, Jianjun Zhu

Non-contrast CT (NCCT) imaging may reduce image contrast and anatomical visibility, potentially increasing diagnostic uncertainty. In contrast, contrast-enhanced CT (CECT) facilitates the observation of regions of interest (ROI). Leading generative models, especially the conditional diffusion model, demonstrate remarkable capabilities in medical image modality transformation. Typical conditional diffusion models commonly generate images with guidance of segmentation labels for medical modal transformation. Limited access to authentic guidance and its low cardinality can pose challenges to the practical clinical application of conditional diffusion models. To achieve an equilibrium of generative quality and clinical practices, we propose a novel Syncretic generative model based on the latent diffusion model for medical image translation (S$^2$LDM), which can realize high-fidelity reconstruction without demand of additional condition during inference. S$^2$LDM enhances the similarity in distinct modal images via syncretic encoding and diffusing, promoting amalgamated information in the latent space and generating medical images with more details in contrast-enhanced regions. However, syncretic latent spaces in the frequency domain tend to favor lower frequencies, commonly locate in identical anatomic structures. Thus, S$^2$LDM applies adaptive similarity loss and dynamic similarity to guide the generation and supplements the shortfall in high-frequency details throughout the training process. Quantitative experiments confirm the effectiveness of our approach in medical image translation. Our code will release lately.

6/21/2024

eess.IV cs.CV

🚀

Multi-Branch Generative Models for Multichannel Imaging with an Application to PET/CT Joint Reconstruction

Noel Jeffrey Pinton, Alexandre Bousse, Catherine Cheze-Le-Rest, Dimitris Visvikis

This paper presents a proof-of-concept approach for learned synergistic reconstruction of medical images using multi-branch generative models. Leveraging variational autoencoders (VAEs) and generative adversarial networks (GANs), our models learn from pairs of images simultaneously, enabling effective denoising and reconstruction. Synergistic image reconstruction is achieved by incorporating the trained models in a regularizer that evaluates the distance between the images and the model, in a similar fashion to multichannel dictionary learning (DiL). We demonstrate the efficacy of our approach on both Modified National Institute of Standards and Technology (MNIST) and positron emission tomography (PET)/computed tomography (CT) datasets, showcasing improved image quality and information sharing between modalities. Despite challenges such as patch decomposition and model limitations, our results underscore the potential of generative models for enhancing medical imaging reconstruction.

4/16/2024

eess.IV cs.CV