Conditioning Generative Latent Optimization for Sparse-View CT Image Reconstruction

Read original: arXiv:2307.16670 - Published 5/1/2024 by Thomas Braure, Delphine Lazaro, David Hateau, Vincent Brandon, K'evin Ginsburger

🛠️

Overview

This paper explores the use of unsupervised techniques, specifically the Generative Latent Optimization (GLO) framework, for solving the Computed Tomography (CT) inverse problem.
It highlights the limitations of deep learning approaches that require large supervised datasets, which cannot generalize to new experimental setups.
The paper proposes an unsupervised conditional approach to GLO (cGLO) that leverages the structural bias of a decoder network, while further reinforcing the prior by sharing a likelihood objective across multiple slices being reconstructed simultaneously.
The cGLO approach is tested on full-dose sparse-view CT using varying training dataset sizes and viewing angles.

Plain English Explanation

Computed Tomography (CT) is a widely used medical imaging technique that can produce detailed 3D images of the inside of the body. However, the process of reconstructing these images from the raw data (X-ray projections) is a challenging "inverse problem" that requires sophisticated algorithms.

Traditional deep learning approaches to this problem have often relied on large supervised datasets, which can limit their ability to adapt to new experimental setups. In contrast, the researchers in this paper explore unsupervised techniques that don't require extensive training data.

One such approach is the Generative Latent Optimization (GLO) framework, which uses a decoder network to generate reconstructions without the need for a training dataset. The researchers build on this by proposing an "unsupervised conditional" version of GLO, called cGLO.

The key idea behind cGLO is that the decoder network's structural bias, combined with a shared likelihood objective across multiple slices being reconstructed, can create a stronger prior for the reconstruction process. This means the algorithm can produce high-quality CT images without relying on a large dataset for training.

The researchers test their cGLO approach on sparse-view CT, where the number of X-ray projections used to reconstruct the image is reduced. They evaluate the performance of cGLO with different training dataset sizes and varying numbers of viewing angles, demonstrating its flexibility and effectiveness.

Technical Explanation

The paper explores the use of unsupervised techniques, particularly the Generative Latent Optimization (GLO) framework, for solving the Computed Tomography (CT) inverse problem. Unlike deep learning approaches that rely on large supervised datasets, unsupervised techniques can potentially generalize better to new experimental setups.

The researchers propose an unsupervised conditional approach to GLO, called cGLO, which leverages the structural bias of a decoder network. Similarly to the Deep Image Prior (DIP) method, cGLO does not require a training dataset. However, it further reinforces the prior by sharing a likelihood objective across multiple slices being reconstructed simultaneously through the same decoder network.

Additionally, the researchers explore the possibility of initializing the decoder's parameters on a small, unsupervised training dataset to enhance the reconstruction performance. The cGLO approach is evaluated on full-dose sparse-view CT, where the number of X-ray projections used for reconstruction is reduced. The experiments are conducted using varying training dataset sizes and different numbers of viewing angles.

Critical Analysis

The researchers acknowledge that while cGLO does not require a large supervised dataset, it still relies on a small unsupervised dataset to initialize the decoder's parameters. This may limit its applicability in scenarios where even a small dataset is not available.

Furthermore, the paper does not provide a detailed comparison of cGLO's performance against other state-of-the-art unsupervised methods, such as Space-Variant Total Variation Boosted by Learning, DensePaNet, or GasPCT. Such a comparison could help better contextualize the strengths and limitations of the cGLO approach.

Additionally, the paper does not explore the impact of the number of simultaneously reconstructed slices on the cGLO performance. It would be interesting to see how the shared likelihood objective and the decoder's structural bias scale as the number of slices increases.

Conclusion

This paper presents an unsupervised conditional approach to the Generative Latent Optimization (GLO) framework, called cGLO, for solving the Computed Tomography (CT) inverse problem. The key innovation is the use of a shared likelihood objective across multiple slices being reconstructed simultaneously, which reinforces the structural bias of the decoder network.

The cGLO approach demonstrates promising results on full-dose sparse-view CT reconstruction, showing that it can produce high-quality images without requiring a large supervised dataset for training. This flexibility could make cGLO a valuable tool for adapting CT imaging to new experimental setups or resource-constrained environments.

While the paper highlights the potential of unsupervised techniques like cGLO, further research is needed to fully understand their limitations and compare their performance to other state-of-the-art methods. Exploring the scalability of cGLO with the number of simultaneously reconstructed slices could also provide valuable insights for its practical applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🛠️

Conditioning Generative Latent Optimization for Sparse-View CT Image Reconstruction

Thomas Braure, Delphine Lazaro, David Hateau, Vincent Brandon, K'evin Ginsburger

Computed Tomography (CT) is a prominent example of Imaging Inverse Problem highlighting the unrivaled performances of data-driven methods in degraded measurements setups like sparse X-ray projections. Although a significant proportion of deep learning approaches benefit from large supervised datasets, they cannot generalize to new experimental setups. In contrast, fully unsupervised techniques, most notably using score-based generative models, have recently demonstrated similar or better performances compared to supervised approaches while being flexible at test time. However, their use cases are limited as they need considerable amounts of training data to have good generalization properties. Another unsupervised approach taking advantage of the implicit natural bias of deep convolutional networks, Deep Image Prior, has recently been adapted to solve sparse CT by reparameterizing the reconstruction problem. Although this methodology does not require any training dataset, it enforces a weaker prior on the reconstructions when compared to data-driven methods. To fill the gap between these two strategies, we propose an unsupervised conditional approach to the Generative Latent Optimization framework (cGLO). Similarly to DIP, without any training dataset, cGLO benefits from the structural bias of a decoder network. However, the prior is further reinforced as the effect of a likelihood objective shared between multiple slices being reconstructed simultaneously through the same decoder network. In addition, the parameters of the decoder may be initialized on an unsupervised, and eventually very small, training dataset to enhance the reconstruction. The resulting approach is tested on full-dose sparse-view CT using multiple training dataset sizes and varying numbers of viewing angles.

5/1/2024

Iterative CT Reconstruction via Latent Variable Optimization of Shallow Diffusion Models

Sho Ozaki, Shizuo Kaji, Toshikazu Imae, Kanabu Nawa, Hideomi Yamashita, Keiichi Nakagawa

Image-generative artificial intelligence (AI) has garnered significant attention in recent years. In particular, the diffusion model, a core component of generative AI, produces high-quality images with rich diversity. In this study, we proposed a novel computed tomography (CT) reconstruction method by combining the denoising diffusion probabilistic model with iterative CT reconstruction. In sharp contrast to previous studies, we optimized the fidelity loss of CT reconstruction with respect to the latent variable of the diffusion model, instead of the image and model parameters. To suppress the changes in anatomical structures produced by the diffusion model, we shallowed the diffusion and reverse processes and fixed a set of added noises in the reverse process to make it deterministic during the inference. We demonstrated the effectiveness of the proposed method through the sparse-projection CT reconstruction of 1/10 projection data. Despite the simplicity of the implementation, the proposed method has the potential to reconstruct high-quality images while preserving the patient's anatomical structures and was found to outperform existing methods, including iterative reconstruction, iterative reconstruction with total variation, and the diffusion model alone in terms of quantitative indices such as the structural similarity index and peak signal-to-noise ratio. We also explored further sparse-projection CT reconstruction using 1/20 projection data with the same trained diffusion model. As the number of iterations increased, the image quality improved comparable to that of 1/10 sparse-projection CT reconstruction. In principle, this method can be widely applied not only to CT but also to other imaging modalities.

9/14/2024

GLIMPSE: Generalized Local Imaging with MLPs

AmirEhsan Khorashadizadeh, Valentin Debarnot, Tianlin Liu, Ivan Dokmani'c

Deep learning is the current de facto state of the art in tomographic imaging. A common approach is to feed the result of a simple inversion, for example the backprojection, to a convolutional neural network (CNN) which then computes the reconstruction. Despite strong results on 'in-distribution' test data similar to the training data, backprojection from sparse-view data delocalizes singularities, so these approaches require a large receptive field to perform well. As a consequence, they overfit to certain global structures which leads to poor generalization on out-of-distribution (OOD) samples. Moreover, their memory complexity and training time scale unfavorably with image resolution, making them impractical for application at realistic clinical resolutions, especially in 3D: a standard U-Net requires a substantial 140GB of memory and 2600 seconds per epoch on a research-grade GPU when training on 1024x1024 images. In this paper, we introduce GLIMPSE, a local processing neural network for computed tomography which reconstructs a pixel value by feeding only the measurements associated with the neighborhood of the pixel to a simple MLP. While achieving comparable or better performance with successful CNNs like the U-Net on in-distribution test data, GLIMPSE significantly outperforms them on OOD samples while maintaining a memory footprint almost independent of image resolution; 5GB memory suffices to train on 1024x1024 images. Further, we built GLIMPSE to be fully differentiable, which enables feats such as recovery of accurate projection angles if they are out of calibration.

6/21/2024

🏋️

A Construct-Optimize Approach to Sparse View Synthesis without Camera Pose

Kaiwen Jiang, Yang Fu, Mukund Varma T, Yash Belhe, Xiaolong Wang, Hao Su, Ravi Ramamoorthi

Novel view synthesis from a sparse set of input images is a challenging problem of great practical interest, especially when camera poses are absent or inaccurate. Direct optimization of camera poses and usage of estimated depths in neural radiance field algorithms usually do not produce good results because of the coupling between poses and depths, and inaccuracies in monocular depth estimation. In this paper, we leverage the recent 3D Gaussian splatting method to develop a novel construct-and-optimize method for sparse view synthesis without camera poses. Specifically, we construct a solution progressively by using monocular depth and projecting pixels back into the 3D world. During construction, we optimize the solution by detecting 2D correspondences between training views and the corresponding rendered images. We develop a unified differentiable pipeline for camera registration and adjustment of both camera poses and depths, followed by back-projection. We also introduce a novel notion of an expected surface in Gaussian splatting, which is critical to our optimization. These steps enable a coarse solution, which can then be low-pass filtered and refined using standard optimization methods. We demonstrate results on the Tanks and Temples and Static Hikes datasets with as few as three widely-spaced views, showing significantly better quality than competing methods, including those with approximate camera pose information. Moreover, our results improve with more views and outperform previous InstantNGP and Gaussian Splatting algorithms even when using half the dataset. Project page: https://raymondjiangkw.github.io/cogs.github.io/

6/12/2024