RadRotator: 3D Rotation of Radiographs with Diffusion Models

Read original: arXiv:2404.13000 - Published 4/22/2024 by Pouria Rouzrokh, Bardia Khosravi, Shahriar Faghani, Kellen L. Mulford, Michael J. Taunton, Bradley J. Erickson, Cody C. Wyles

🤔

Overview

Transforming 2D images into 3D volumes is a challenging problem in computer vision
Previous studies have attempted to convert radiographs into CT volumes
This paper introduces a diffusion model-based technology that can rotate the anatomical content of radiographs in 3D space

Plain English Explanation

This paper explores a new way to transform 2D medical images, like X-rays, into 3D models that can be viewed from different angles. Previous research has tried to go from 2D X-rays to full 3D CT scans, but this paper takes a different approach.

The key idea is to use a type of machine learning model called a "diffusion model" to rotate the anatomy shown in an X-ray into a 3D view. This would let doctors see the entire 3D structure of the body part in the X-ray, not just the flat 2D image.

The researchers used CT scans to create "fake" X-rays called Digitally Reconstructed Radiographs (DRRs) to train their diffusion model. They found that using diffusion models rather than other techniques like Generative Adversarial Networks gave them better results, even though the models were slightly slower.

They also found a simple way to make the diffusion model work well on real X-rays, not just the fake DRR training data. This involved randomly adjusting the pixel brightness levels during training to make the model ignore differences between real and fake X-rays.

Technical Explanation

The paper introduces a diffusion model-based approach to transform 2D radiographs into 3D visualizations of the underlying anatomy. Unlike previous work that used Generative Adversarial Networks (GANs) to convert radiographs to CT volumes, the authors leverage conditional diffusion models with classifier-free guidance. This technique achieves higher mode coverage and improved output quality, though at the cost of slower inference time.

To train the diffusion model, the authors used Digitally Reconstructed Radiographs (DRRs) created from CT volumes as the input data. They also developed a simple yet effective data augmentation technique that randomly adjusts the pixel intensity histograms of both the input DRRs and the ground-truth imaging data during training. This makes the diffusion model agnostic to variations in pixel intensity distribution, allowing it to be reliably trained on DRRs and then applied directly to conventional radiographs during inference.

Critical Analysis

The paper presents a promising approach for transforming 2D radiographs into 3D visualizations, addressing limitations of prior work that used GANs. However, the authors acknowledge that the slower inference time of the diffusion model may be a drawback, particularly for real-time medical applications.

Additionally, the paper does not provide a detailed evaluation of the clinical utility or accuracy of the 3D visualizations generated by the model. Further research would be needed to assess how well the 3D rotations capture the true 3D anatomy and whether they provide meaningful additional information beyond the original 2D radiograph.

The data augmentation technique used to make the diffusion model robust to pixel intensity variations is an interesting contribution, but its broader applicability to other medical imaging tasks is not explored. Investigating the generalizability of this approach could be a fruitful avenue for future work.

Conclusion

This paper introduces a novel diffusion model-based approach to transform 2D radiographs into 3D visualizations of the underlying anatomy. By leveraging conditional diffusion models and a simple data augmentation technique, the authors have developed a method that can reliably rotate the anatomical content of radiographs in 3D space, potentially enabling new perspectives for medical diagnosis and treatment planning. While the slower inference time may be a limitation, the improved output quality and mode coverage of the diffusion model represent an important step forward in this challenging computer vision task.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🤔

RadRotator: 3D Rotation of Radiographs with Diffusion Models

Pouria Rouzrokh, Bardia Khosravi, Shahriar Faghani, Kellen L. Mulford, Michael J. Taunton, Bradley J. Erickson, Cody C. Wyles

Transforming two-dimensional (2D) images into three-dimensional (3D) volumes is a well-known yet challenging problem for the computer vision community. In the medical domain, a few previous studies attempted to convert two or more input radiographs into computed tomography (CT) volumes. Following their effort, we introduce a diffusion model-based technology that can rotate the anatomical content of any input radiograph in 3D space, potentially enabling the visualization of the entire anatomical content of the radiograph from any viewpoint in 3D. Similar to previous studies, we used CT volumes to create Digitally Reconstructed Radiographs (DRRs) as the training data for our model. However, we addressed two significant limitations encountered in previous studies: 1. We utilized conditional diffusion models with classifier-free guidance instead of Generative Adversarial Networks (GANs) to achieve higher mode coverage and improved output image quality, with the only trade-off being slower inference time, which is often less critical in medical applications; and 2. We demonstrated that the unreliable output of style transfer deep learning (DL) models, such as Cycle-GAN, to transfer the style of actual radiographs to DRRs could be replaced with a simple yet effective training transformation that randomly changes the pixel intensity histograms of the input and ground-truth imaging data during training. This transformation makes the diffusion model agnostic to any distribution variations of the input data pixel intensity, enabling the reliable training of a DL model on input DRRs and applying the exact same model to conventional radiographs (or DRRs) during inference.

4/22/2024

DX2CT: Diffusion Model for 3D CT Reconstruction from Bi or Mono-planar 2D X-ray(s)

Yun Su Jeong, Hye Bin Yoo, Il Yong Chun

Computational tomography (CT) provides high-resolution medical imaging, but it can expose patients to high radiation. X-ray scanners have low radiation exposure, but their resolutions are low. This paper proposes a new conditional diffusion model, DX2CT, that reconstructs three-dimensional (3D) CT volumes from bi or mono-planar X-ray image(s). Proposed DX2CT consists of two key components: 1) modulating feature maps extracted from two-dimensional (2D) X-ray(s) with 3D positions of CT volume using a new transformer and 2) effectively using the modulated 3D position-aware feature maps as conditions of DX2CT. In particular, the proposed transformer can provide conditions with rich information of a target CT slice to the conditional diffusion model, enabling high-quality CT reconstruction. Our experiments with the bi or mono-planar X-ray(s) benchmark datasets show that proposed DX2CT outperforms several state-of-the-art methods. Our codes and model will be available at: https://www.github.com/intyeger/DX2CT.

9/16/2024

DIFR3CT: Latent Diffusion for Probabilistic 3D CT Reconstruction from Few Planar X-Rays

Yiran Sun, Hana Baroudi, Tucker Netherton, Laurence Court, Osama Mawlawi, Ashok Veeraraghavan, Guha Balakrishnan

Computed Tomography (CT) scans are the standard-of-care for the visualization and diagnosis of many clinical ailments, and are needed for the treatment planning of external beam radiotherapy. Unfortunately, the availability of CT scanners in low- and mid-resource settings is highly variable. Planar x-ray radiography units, in comparison, are far more prevalent, but can only provide limited 2D observations of the 3D anatomy. In this work we propose DIFR3CT, a 3D latent diffusion model, that can generate a distribution of plausible CT volumes from one or few (<10) planar x-ray observations. DIFR3CT works by fusing 2D features from each x-ray into a joint 3D space, and performing diffusion conditioned on these fused features in a low-dimensional latent space. We conduct extensive experiments demonstrating that DIFR3CT is better than recent sparse CT reconstruction baselines in terms of standard pixel-level (PSNR, SSIM) on both the public LIDC and in-house post-mastectomy CT datasets. We also show that DIFR3CT supports uncertainty quantification via Monte Carlo sampling, which provides an opportunity to measure reconstruction reliability. Finally, we perform a preliminary pilot study evaluating DIFR3CT for automated breast radiotherapy contouring and planning -- and demonstrate promising feasibility. Our code is available at https://github.com/yransun/DIFR3CT.

8/28/2024

DiffuX2CT: Diffusion Learning to Reconstruct CT Images from Biplanar X-Rays

Xuhui Liu, Zhi Qiao, Runkun Liu, Hong Li, Juan Zhang, Xiantong Zhen, Zhen Qian, Baochang Zhang

Computed tomography (CT) is widely utilized in clinical settings because it delivers detailed 3D images of the human body. However, performing CT scans is not always feasible due to radiation exposure and limitations in certain surgical environments. As an alternative, reconstructing CT images from ultra-sparse X-rays offers a valuable solution and has gained significant interest in scientific research and medical applications. However, it presents great challenges as it is inherently an ill-posed problem, often compromised by artifacts resulting from overlapping structures in X-ray images. In this paper, we propose DiffuX2CT, which models CT reconstruction from orthogonal biplanar X-rays as a conditional diffusion process. DiffuX2CT is established with a 3D global coherence denoising model with a new, implicit conditioning mechanism. We realize the conditioning mechanism by a newly designed tri-plane decoupling generator and an implicit neural decoder. By doing so, DiffuX2CT achieves structure-controllable reconstruction, which enables 3D structural information to be recovered from 2D X-rays, therefore producing faithful textures in CT images. As an extra contribution, we collect a real-world lumbar CT dataset, called LumbarV, as a new benchmark to verify the clinical significance and performance of CT reconstruction from X-rays. Extensive experiments on this dataset and three more publicly available datasets demonstrate the effectiveness of our proposal.

7/19/2024