KEYWORDS: Transformers, Cancer detection, Windows, Prostate cancer, Prostate, Magnetic resonance imaging, Education and training, Principal component analysis, Object detection, Deep learning
Prostate multiparametric magnetic resonance imaging (mpMRI) has demonstrated promising results in prostate cancer (PCa) detection using deep learning models using convolutional neural networks (CNNs). Recently, transformers have achieved competitive performance compared to CNNs in computer vision. Large-scale transformers benefit from training with large-scale annotated data, which are expensive and labor-intensive to obtain in medical imaging. Self-supervised learning can effectively leverage unlabeled data to extract useful semantic representations with no additional annotation cost. This can improve model performance on downstream tasks with limited labelled data and increase model robustness to external data. We present a novel end-to-end cross-shaped transformer model CSwin UNet to detect clinically significant prostate cancer (csPCa) in prostate bi-parametric MR imaging (bpMRI). Using a large prostate bpMRI dataset with 1500 patients, our Cswin UNet achieves 0.880±0.013 AUC and 0.790 ±0.033 pAUC, significantly outperforming state-of-the-art CNN and transformer models.
This study aims to simplify radiation therapy treatment planning by proposing an MRI-to-CT transformer-based denoising diffusion probabilistic model (CT-DDPM) to generate high-quality synthetic computed tomography (sCT) from magnetic resonance imaging (MRI). The goal is to reduce patient radiation dose and setup uncertainty by eliminating the need for CT simulation and image registration during treatment planning. The CT-DDPM utilizes a diffusion process with a shifted-window transformer network to transform MRI into sCT. The model comprises two processes: a forward process, adding Gaussian noise to real CT scans to create noisy images, and a reverse process, denoising the noisy CT scans using a Vshaped network (Vnet) conditioned on the corresponding MRI. With an optimally trained Swin-Vnet, the reverse process generates sCT scans matching the MRI anatomy. The method is evaluated using mean absolute error (MAE) of Hounsfield unit (HU), peak signal-to-noise ratio (PSNR), multi-scale Structural Similarity index (MS-SSIM) and normalized cross-correlation (NCC) between ground truth CTs and sCTs. For the brain dataset, CT-DDPM demonstrated state-of-the-art quantitative results, exhibiting an MAE of 45.210±3.807 HU, a PSNR of 26.753±0.861 dB, an SSIM of 0.964±0.005, and an NCC of 0.981±0.004. In the context of the prostate dataset, the model also showed impressive performance with an MAE of 55.492±8.281 HU, a PSNR of 28.912±2.591 dB, an SSIM of 0.894±0.092, and an NCC of 0.945±0.054. Across both datasets, CT-DDPM significantly outperformed competing networks in most metrics, a finding corroborated by the student’s paired t-test. The source code is available: https://github.com/shaoyanpan/Synthetic-CT-generation-from- MRI-using-3D-transformer-based-denoising-diffusion-model
This study proposes an innovative 3D diffusion-based model called the Cycle-consistency Geometric-integrated X-ray to Computed Tomography Denoising Diffusion Probabilistic Model (X-CBCT-DDPM). The X-CBCT-DDPM is designed to effectively reconstruct volumetric Cone-Beam CBCTs (CBCTs) from a single X-ray projection from any angle, reducing the number of required projections and minimizing patient radiation exposure in acquiring volumetric images. In contrast to the traditional DDPMs, the X-CBCT-DDPM utilizes dual DDPMs: one for generating full-view x-ray projections and another for volumetric CBCT reconstruction. These dual networks synergistically enhance each other's learning capabilities, leading to improved reconstructed CBCT quality with high anatomical accuracy. The proposed patient-specific X-CBCT-DDPM was tested using 4DCBCT data from ten patients, with each patient's dataset comprising ten phases of 3D CBCTs to simulate CBCTs and Cone-Beam X-ray projections. For model training, eight phases of 3D CBCTs from each patient were utilized, with one for validation purposes and the remaining one reserved for final testing. The X-CBCT-DDPM exhibits superior performance to DDPM, conditional Generative Adversarial Networks (GAN), and Vnet, in terms of various metrics, including a Mean Absolute Error (MAE) of 36.36±4.04, Peak Signal-to-Noise Ratio (PSNR) of 32.83±0.98, Structural Similarity Index (SSIM) of 0.91±0.01, and Fréchet Inception Distance (FID) of 0.32±0.02. These results highlight the model's potential for ultra-sparse projection-based CBCT reconstruction.
KEYWORDS: Proton therapy, Deep learning, Education and training, Monte Carlo methods, Gallium nitride, Tissues, Energy transfer, Radiotherapy, Prostate cancer, Performance modeling
The advantage of proton therapy over photon therapy lies in the Bragg peak effect, which allows protons to deposit most of their energy precisely at the tumor site, minimizing damage to surrounding healthy tissue. Despite this, the standard approach to clinical treatment planning does not fully consider the differences in biological effectiveness between protons and photons. Currently, a uniform Relative Biological Effectiveness (RBE) value of 1.1 is used in clinical settings to compare protons to photons, despite evidence that proton RBE can vary significantly. This variation underscores the need for more refined proton therapy treatment planning those accounts for the variable RBE. A critical parameter in assessing the RBE of proton therapy is the Dose-Average Linear Energy Transfer (LETd), which is instrumental in optimizing proton treatment plans. Accurate LETd distribution calculations require complex physical models and the implementation of sophisticated Monte-Carlo (MC) simulation software. These simulations are both computationally intensive and time-consuming. To address these challenges, we propose a Deep Learning (DL)-based framework aimed at predicting the LETd distribution map from the dose distribution map. This framework utilizes Mean Absolute Error (MAE), Peak Signal-to-Noise Ratio (PSNR), and Normalized Cross Correlation (NCC) to measure discrepancies between MC-derived LETd and the LETd maps generated by our model. Our approach has shown promise in producing synthetic LETd maps from dose maps, potentially enhancing proton therapy planning through the provision of precise LETd information. This development could significantly contribute to more effective and individualized proton therapy treatments, optimizing therapeutic outcomes while further minimizing harm to healthy tissue.
KEYWORDS: Image segmentation, Cardiovascular magnetic resonance imaging, Windows, Transformers, Magnetic resonance imaging, Deep learning, Information fusion, Visualization, Network architectures
In this work, we aimed to develop a deep-learning algorithm for segmentation of cardiac Magnetic Resonance Image (MRI) to facilitate contouring of Left Ventricle (LV), Right Ventricle (RV), and Myocardium (Myo). We proposed a Shifting Block Partition Multilayer Perceptron (SBP-MLP) network built upon a symmetric U-shaped encoder-decoder network. We evaluated this proposed network on a public cardiac MRI dataset, ACDC training dataset. The network performance was quantitatively evaluated using Hausdorff Distance (HD), Mean Surface Distance (MSD) and Residual Mean Square distance (RMS) as well as Dice score coefficient, sensitivity, and precision. The performance of the proposed network was compared with two other state-of-the-art networks known as dynamic UNet and Swin-UNetr. Our proposed network achieved the following quantitative metrics as HD = 1.521±0.090 mm, MSD = 0.287±0.080 mm, RMSD = 0.738±0.315 mm. as well as Dice = 0.948±0.020, precision = 0.946±0.017, sensitivity = 0.951±0.027. The proposed network showed statistically significant improvement compared to the Swin-UNetr and dynamic UNet algorithms across most metrics for the three segments. The SBP-MLP showed superior segmentation performance, as evidenced by higher Dice score and lower HD relative to competing methods. Overall, the proposed SBP-MLP demonstrates comparable or superior performance to competing methods. This robust method has the potential for implementation in clinical workflows for cardiac segmentation and analysis.
This study aims to enhance the resolution of Magnetic Resonance Imaging (MRI) using a cutting-edge diffusion probabilistic Deep Learning (DL) technique, addressing the challenges posed by long image acquisition times and limited scanning dimensions. In this research, we propose a novel approach utilizing a probabilistic DL model to synthesize High-Resolution MRI (HR-MRI) images from Low-Resolution (LR) inputs. The proposed model consists of two main steps. In the forward process, Gaussian noise is systematically introduced to LR images through a Markov chain. In the reverse process, a U-Net model is trained using a loss function based on Kullback-Leibler divergence, which maximizes the likelihood of producing ground truth images. We assess the effectiveness of our method on T2-FLAIR images from 120 brain patients in the public BraTS2020 T2-FLAIR database. To gauge performance, we compare our approach with a clinical bicubic model (referred to as Bicubic) and Conditional Generative Adversarial Networks (CGAN). On the BraTS2020 dataset, our framework enhances the Peak Signal-to-Noise Ratio (PSNR) of LR images by 7%, whereas CGAN results in a 3% reduction. The corresponding Multi-scale Structural similarity (MSSIM) values for the proposed method and CGAN are 0.972±0.017 and 0.966±0.024. In this study, we have examined the potential of a diffusion probabilistic DL framework to elevate MRI image resolution. Our proposed method demonstrates the capability to generate high-quality HR images while avoiding issues such as mode collapse or learning multimodal distributions, which are commonly observed in CGAN-based approaches. This framework has the potential to significantly reduce MRI acquisition times for HR imaging, thereby mitigating the risk of motion artifacts and crosstalk.
The purpose of this study is to reduce radiation exposure in PET imaging while preserving high-quality clinical PET images. We propose the PET Consistency Model (PET-CM), an efficient diffusion-model-based approach, to estimate full-dose PET images from low-dose PETs. PET-CM delivers synthetic images of comparable quality to state-of-the-art diffusion-based methods but with significantly higher efficiency. The process involves adding Gaussian noise to full-dose PETs through a forward diffusion process and then using a PET U-shaped network (PET-Unet) for denoising in a reverse diffusion process, conditioned on corresponding low-dose PETs. In experiments denoising one-eighth dose images to full-dose images, PET-CM achieved an MAE of 1.321±0.134%, a PSNR of 33.587±0.674 dB, an SSIM of 0.960±0.008, and an NCC of 0.967±0.011. In scenarios of reducing from 1/4 dose to full dose, PET-CM further showcased its capability with an MAE of 1.123±0.112%, a PSNR of 35.851±0.871 dB, an SSIM of 0.975±0.003, and an NCC of 0.990±0.003.
This work presents GhostMorph, an innovative model for deformable inter-subject registration in medical imaging, inspired by GhostNet's principles. GhostMorph addresses the computational challenges inherent in medical image registration, particularly in deformable registration where complex local and global deformations are prevalent. By integrating Ghost modules and 3D depth-wise separable convolutions into its architecture, GhostMorph significantly reduces computational demands while maintaining high performance. The study benchmarks GhostMorph against state-of-the-art registration methods using the Liver Tumor Segmentation Benchmark (LiTS) dataset, demonstrating its comparable accuracy and improved computational efficiency. GhostMorph emerges as a viable, scalable solution for real-time and resource-constrained clinical scenarios, marking a notable advancement in medical image registration technology.
The advent of computed tomography significantly improves patients’ health regarding diagnosis, prognosis, and treatment planning and image-guided radiotherapy. However, tomographic imaging cannot achieve real-time imaging and the imaging escalates concomitant radiation doses to patients, inducing potential secondary cancer by 4%. We demonstrate the feasibility of a data-driven approach to synthesize volumetric images using patients’ surface images, which can be obtained from a zero-dose surface imaging system. This study includes 500 computed tomography (CT) image sets from 50 patients. Compared to the ground truth CT, the synthetic images result in the evaluation metric values of 26.9 ± 4.1 Hounsfield units, 39.1 ± 1.0 dB, and 0.97 ± 0.01 regarding the mean absolute error, peak signal-to-noise ratio, and structural similarity index measure. This approach provides a data integration solution that can potentially enable real-time imaging, which is free of radiation-induced risk and could be applied to image-guided medical procedures.
In prostate brachytherapy, focal boost on dominant intraprostatic lesions (DILs) can reduce the recurrence rate while keeping low toxicity. In recent years, ultrasound (US) prostate tissue characterization has demonstrated the feasibility in detecting dominant intraprostatic lesions. With recent developments in computer-aided diagnosis (CAD), deep learningbased methods have provided solutions for efficient analysis of US images. In this study, we aim to develop a Shiftedwindows (Swin) Transformer-based method for DIL classification. The self-attention layers in Swin Transformer allow efficient feature discrimination between benign tissues and intraprostatic lesions. We simplified the structure of Swin Transformer to avoid overfitting on a small dataset. The proposed transformer structure achieved 83% accuracy and 0.86 AUC at patient level on three-fold cross validation, demonstrating the feasibility of applying our method for dominant lesion classification from US images, which is of clinical significance for radiotherapy treatment planning.
In this work, we propose MLP-Vnet, a token-based U-shaped multilayer linear perceptron-mixer (MLP-Mixer) network, incorporating a convolutional neural network for multi-structure segmentation on cardiac magnetic resonance imaging (MRI). The proposed MLP-Vnet is composed of an encoder and decoder. Taking an MRI scan as input, the semantic features are extracted by the encoder with one early convolutional block followed by four consecutive MLP-Mixer blocks. Then, the extracted features are passed to the decoder with mirrored architecture of the encoder to form a N-classes segmentation map. We evaluated our proposed network on the Automated Cardiac Diagnosis Challenge (ACDC) dataset. The performance of the network was assessed in terms of the volume- and surface-based similarities between the predicted contours and the manually delineated ground-truth contours, and computational efficiency. The volume-based similarities were measured by the Dice score coefficient (DSC), sensitivity, and precision. The surface-based similarities were measured by Hausdorff distance (HD), mean surface distance (MSD), and residual mean square distance (RMSD). The performance of the MLP-Vnet was compared with four state-of-the-art networks. The proposed network demonstrated statistically superior DSC and superior sensitivity or precision on all the three structures to the competing networks (p-value < 0.05): average DSC of 0.904, sensitivity of 0.908 and precision of 0.902 among all structures. The best surfaceased similarities were also demonstrated by the MLP-Vnet: average HD = 3.266 mm, MSD = 0.684 mm, and RMSD = 1.487 mm. Compared to the competing networks, the MLP-Vnet showed the shortest training time (7.32 hours) inference time per patient (3.12 seconds). The proposed MLP-Vnet is capable of using reasonable number of trainable parameters to solve the segmentation task on the cardiac MRI scans more quickly and accurately than the state-ofthe- art networks. This novel network could be a promising tool for accurate and efficient cardiac MRI segmentation to assist cardiac diagnosis and treatment decision making.
Proton therapy requires highly accurate dose calculation for treatment planning to ensure the doses delivered to the tumor precisely. The accuracy of mass density estimation dominates the uncertainty in proton dose calculation. This work proposed a fully connected neural network (FCNN) based framework to estimate mass density from single-energy compute tomography. The FCNN was design as 9 hidden layers and 150 hidden units and nonlinear activation function. A CIRS 062M electron density phantom was used to train FCNN, and CIRS M701 and M702 was used to evaluate the performance of models. For M701, FCNN has mean absolute percentage errors of mass density at 0.39%,0.92%,0.68%,1.57,0.92% over brain, spinal cord, soft tissue, lung, and bone. For M702, the mean absolute percentage errors of mass density estimation by FCNN are 0.89%,1.09%,0.70%,1.52% and 3.19%, respectively.
This work proposes a novel U-shaped neural network, Shifted-window MLP (Swin-MLP), that incorporates a Convolutional Neural Network (CNN) and Multilayer Linear Perceptron-Mixer (MLP-Mixer) for automatic CT multi-organ segmentation. The network has a structure like V-net: 1) a Shifted-window MLP-Mixer encoder learns semantic features from the input CT scans, and 2) a decoder, which mirrors the architecture of the encoder, then reconstructs segmentation maps from the encoder’s features. Novel to the proposed network, we apply a Shifted-window MLP-Mixer rather than convolutional layers to better model both global and local representations of the input scans. We evaluate the proposed network using an institutional pelvic dataset comprising 120 CT scans, and a public abdomen dataset containing 30 scans. The network’s segmentation accuracy is evaluated in two domains: 1) volume-based accuracy is measured by Dice Similarity Coefficient (DSC), segmentation sensitivity, and precision; 2) surface-based accuracy is measured by Hausdorff Distance (HD), Mean Surface Distance (MSD), and Residual Mean Square distance (RMS). The average DSC achieved by MLP-Vnet on the pelvic dataset is 0.866; sensitivity is 0.883, precision is 0.856, HD is 11.523 millimeter (mm), MSD is 3.926 mm, and RMS is 6.262 mm. The average DSC on the public abdomen dataset is 0.903, and HD is 5.275 mm. The proposed MLP-Mixer-Vnet demonstrates significant improvement over CNN-based networks. The automatic multi-organ segmentation tool may potentially facilitate the current radiotherapy treatment planning workflow.
In this work, we propose an adversarial attack-based data augmentation method to improve the deep-learning-based segmentation algorithm for the delineation of Organs-At-Risk (OAR) in abdominal Computed Tomography (CT) to facilitate radiation therapy. We introduce Adversarial Feature Attack for Medical Image (AFA-MI) augmentation, which forces the segmentation network to learn out-of-distribution statistics and improve generalization and robustness to noises. AFA-MI augmentation consists of three steps: 1) generate adversarial noises by Fast Gradient Sign Method (FGSM) on the intermediate features of the segmentation network’s encoder; 2) inject the generated adversarial noises into the network, intentionally compromising performance; 3) optimize the network with both clean and adversarial features. The effectiveness of the AFA-MI augmentation was validated on nnUnet. Experiments are conducted segmenting the heart, left and right kidney, liver, left and right lung, spinal cord, and stomach in an institutional dataset collected from 60 patients. We firstly evaluate the AFA-MI augmentation using nnUnet and Token-based Transformer Vnet (TT-Vnet) on the test data from a public abdominal dataset and an institutional dataset. In addition, we validate how AFA-MI affects the networks’ robustness to the noisy data by evaluating the networks with added Gaussian noises of varying magnitudes to the institutional dataset. Network performance is quantitatively evaluated using Dice Similarity Coefficient (DSC) for volume-based accuracy. Also, Hausdorff Distance (HD) is applied for surface-based accuracy. On the public dataset, nnUnet with AFA-MI achieves DSC = 0.85 and HD = 6.16 millimeters (mm); and TT-Vnet achieves DSC = 0.86 and HD = 5.62 mm. On the robustness experiment with the institutional data, AFA-MI is observed to improve the segmentation DSC score ranging from 0.055 to 0.010 across all organs relative to clean inputs. AFA-MI augmentation further improves all contour accuracies up to 0.527 as measured by the DSC score when tested on images with Gaussian noises. AFA-MI augmentation is therefore demonstrated to improve segmentation performance and robustness in CT multi-organ segmentation.
In this work, we propose a convolutional vision transformer V-net (CVT-Vnet) for multi-organ segmentation in 3- dimensional CT images of head and neck cancer patients for radiotherapy treatment planning. Organs include brain-stem, chiasm, mandible, optic nerve (left and right), parotid (left and right), and submandibular (left and right). The proposed CVT-Vnet has a U-shape encoder-decoder architecture. A CVT is firstly deployed as the encoder to encourage global characteristics which still preserve precise local details. And a convolutional decoder is utilized to assemble the segmentation from the features learned by the CVT. We evaluated the network using a dataset of 32 patients undergoing radiotherapy treatment. We present quantitative evaluation of the performance of our proposed CVT-Vnet, in terms of segmentation volume similarity (Dice score, sensitivity , precision and absolution percentage volume difference (AVD)) and surface similarity (Hausdorff distance (HD), mean surface distance (MSD) and residual mean square distance (RMSD)), using the physicians’ manual contour as the ground truth. The volume similarities averaged over all organs were 0.79 as Dice score, 0.83 as sensitivity and 0.78 as precision. The average surface similarities were 13.41mm as HD, 0.39mm as MSD and 1.01mm as RMSD. The proposed network performed significantly better than Vnet and DV-net, which are two state-of-the-art methods. The proposed CVT-Vnet can be a promising tool of multi-organ delineation for head and neck radiotherapy treatment planning.
Automatic multi-organ segmentation is a cost-effective tool for generating organ contours using computed tomography (CT) images. This work proposes a deep-learning algorithm for multi-organ (bladder, prostate, rectum, left and right femoral heads) segmentation in pelvic CT images for prostate radiation treatment planning. We propose an encoder-decoder network with a V-net backbone for local feature extraction and contour reconstruction. Novel to our network, we utilize a token-based transformer, which encourages long-range dependency, to forward more informative high-resolution feature maps from the encoder to the decoder. In addition, a knowledge distillation strategy was applied to improve the network’s generalization. We evaluate the network using a dataset collected from 50 patients with prostate cancer. A quantitative evaluation of the proposed network’s performance was performed on each organ based on: 1) volume similarity between the segmented contours and ground truth using Dice score, segmentation sensitivity, precision, and absolute percentage volume difference (AVD), 2) surface similarity evaluated by Hausdorff distance (HD), mean surface distance (MSD) and residual mean square distance (RMSD). The performance was then evaluated against other state-of-art methods. The average volume similarities achieved by the network over all organs were: Dice score = 0.83, sensitivity = 0.84, and precision = 0.83; the average surface similarities were HD = 5.77mm, MSD = 0.93mm, RMSD = 2.77mm, and AVD =12.85%. The proposed methods performed significantly better than competing methods in most evaluation metrics. The proposed network may be a promising segmentation approach for use in routine prostate radiation treatment planning.
Realistic lesion generation is a useful tool for system evaluation and optimization. Generated lesions can serve as realistic imaging tasks for task-base image quality assessment, as well as targets in virtual clinical trials. In this work, we investigate a data-driven approach for categorical lung lesion synthesis using public lung CT databases. We propose a generative adversarial network with a Wasserstein discriminator and gradient penalty to stabilize training. We further included conditional inputs such that the network can generate user-specified lesion categories. Novel to our network, we directly incorporated radiomic features in an intermediate supervision step to encourage similar textures between generated and real lesions. We calculated the network using lung lesions from the Lung Image Database Consortium (LIDC) database. Lesions are divided into two categories: solid vs. non-solid. We performed quantitative evaluation of network performance based on four criteria: 1) overfitting, in terms of structural and morphological similarity to the training data, 2) diversity of generated lesions, in terms of similarity to other generated data, 3) similarity to real lesions, in terms of distribution of example radiomics features, and 4) conditional consistency, in terms of classification accuracy using a classifier trained on the training lesions. We imposed a quantitative threshold for similarity based on visual inspection. The percentage of non-solid and solid lesions that satisfy low overfitting and high diversity is 87.1% and 70.2% of non-solid and solid lesions respectively. The distribution of example radiomics features are similar in the generated and real lesions indicated by low Kullback–Leibler divergence scores: 1.62 for non-solid lesions and 1.13 for solid lesions. Classification accuracy for the generated lesions are comparable with that for the real lesions. The proposed network presents a promising approach for generating realistic lesions with clinically relevant features crucial for the comprehensive assessment of novel medical imaging systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.