Open Access Paper
17 October 2022 S2MS: Self-supervised learning driven multi-spectral CT image enhancement
Author Affiliations +
Proceedings Volume 12304, 7th International Conference on Image Formation in X-Ray Computed Tomography; 1230425 (2022) https://doi.org/10.1117/12.2647001
Event: Seventh International Conference on Image Formation in X-Ray Computed Tomography (ICIFXCT 2022), 2022, Baltimore, United States
Abstract
Photon counting spectral CT (PCCT) can produce reconstructed attenuation maps in different energy channels, reflecting energy properties of the scanned object. Due to the limited photon numbers and the non-ideal detector response of each energy channel, the reconstructed images usually contain much noise. With the development of Deep Learning (DL) technique, different kinds of DL-based models have been proposed for noise reduction. However, most of the models require clean data set as the training labels, which are not always available in medical imaging field. Inspiring by the similarities of each channel's reconstructed image, we proposed a self-supervised learning based PCCT image enhancement framework via multi-spectral channels (S 2MS). In S 2MS framework, both the input and output labels are noisy images. Specifically, one single channel image was used as output while images of other single channels and channel-sum image were used as input to train the network, which can fully use the spectral data information without extra cost. The simulation results based on the AAPM Low-dose CT Challenge database showed that the proposed S2MS model can suppress the noise and preserve details more effectively in comparison with the traditional DL models, which has potential to improve the image quality of PCCT in clinical applications.

1.

INTRODUCTION

Photon counting spectral CT (PCCT) can separately collect the incident photons in different energy bins, which has high energy resolution and can generate more accurate material decomposition [1], [2]. Nevertheless, with the increase number of energy bins, counting rate is limited in each individual channel, which results in a relatively low signal-to-noise ratio (SNR). Moreover, there are complicated noises caused by non-ideal response of detector, such as fluorescence x-ray effects, K-escape, charging sharing, and pulse pileups [3]. Noise in the reconstructed CT images will seriously affect diagnosis of doctors.

To reduce noise in CT images, recent deep learning (DL) technique has been widely developed in the field of CT image denoising and shows the potential in applications. Yang et al.used the generative adversarial network (GAN) with Wasserstein distance and perceptual similarity to reduce the noise in CT images [4]. Chang et al. proposed a spectrum estimation-guided algorithm for dual energy CT reconstruction [5] and a CNN-based algorithm to reduce the ring artifact in CT images [6].

However, traditional DL methods require high-quality clean images as training labels to achieve high performance, which are difficult to obtain especially in medical imaging field. To solve this problem, Lehtinen et al. introduced the Noise2Noise model (N2N), where the network was trained to map one noisy realization to another noisy realization [7]. In addition, photon counting spectral CT (PCCT) provides an opportunity to produce reconstructed attenuation maps in different energy channels, which reflect energy properties of the scanned object. Using the similarity of images in different energy bins, we proposed a self-supervised learning driven PCCT image denoising method via multi-spectral channels based on the N2N network model (S2MS). In our framework, the input are images from multi-channels and a channel-sum image, while the output is image of one single channel. Both input and output are noisy PCCT images.

Compared with N2N, our proposed S2MS fully used all the reconstructed images in different energy bins at the same time, rather than processing images in each channel separately. All the simulated experiments were carried out and the results show the S2MS model is effective and accurate in noise reduction and detail preservation.

2.

MATERIALS AND METHODS

2.1

Basic Principle of PCCT

Compared with the traditional energy integration detector, photon counting detector (PCD) can separately count out the number of photons in each energy channel, which can be used to obtain the projection in different energy bins. Fig. 1 shows an example of PCCT images in four energy channels. The similarity between images in different energy bins can be used for noise reduction.

Figure 1.

An example of reconstructed PCCT images in different energy bins. The display window is [0,0.4] cm-1.

00078_PSISDG12304_1230425_page_2_1.jpg

2.2

Deep-Learning based Denoising

In deep learning based denoising method, the input of the network is generally regarded as the following:

00078_PSISDG12304_1230425_page_2_2.jpg

where xi is the corrupted input, yi is the clean target and ni denotes the corresponding noise. According to the type of training target, DL-based denoising can be divided into the following two types:

1) Supervised Learning-N2C

Traditionally, supervised learning is always used for deep-learning based CT image denosing, which means training a regression model with pairs 00078_PSISDG12304_1230425_page_2_3.jpg and minimizing the function:

00078_PSISDG12304_1230425_page_2_4.jpg

where 00078_PSISDG12304_1230425_page_2_5.jpg is the denoising convolutional neural network (CNN), θ is weight, N is the total number of training samples. In this paper, the image denoising based on supervised learning is referred as Noise2Clean (N2C).

2) Self-Supervised Learning-N2N

Opposite to Noise2Clean, Noise2Noise (N2N) is a self-supervised learning framework where input and target are both corrupted. It can be expressed as:

00078_PSISDG12304_1230425_page_2_6.jpg

where ni1 and ni2 are two independently noise realizations. It has been assumed that the Noise2Noise training is equivalent to Noise2Clean training under certain mild conditions [7], [8]:

  • 1. N → ∞;

  • 2. Conditional expection E{ni2 | yi} = 0;

  • 3. ni1 and ni2 are independent;

  • 4.i, |fθ(yi+ni)| < ∞.

Since filter in the convolutional neural network is shift-invariant, different parts of the image may be served as multiple training samples. Even if the size of training data is small, the actual number of training samples is large enough to satisfy condition 1. After reconstructing, the noise in image domain is zero-mean and independent in different energy channels[9], which means condition 2 and condition 3 are both satisfied in our method. Condition 4 can be easily satisfied by choosing the meaningful parameters of the network.

2.3

Noise2Noise Network for PCCT Image Denoising

By using the similarities of reconstructed images in different energy bins, we proposed a noise2noise network-based PCCT image denoising framework based on self-supervised learning via multi-spectral channels (S2MS). The basic process of our framework is shown in Fig. 2.

Figure 2.

The noise2noise network-based PCCT image denoising framework with self-supervised learning via multi-spectral channels (S2MS). The attenuation images were divided by the mass attenuation coefficient, converted into density images.

00078_PSISDG12304_1230425_page_3_1.jpg

In S2MS, L-1 reconstructed images in single channel and a channel-sum image (linear attenuation map) were divided by the mass attenuation coefficient of each channel (Attenuation 2Density), converted into density images before training, which were used as the input of L channels. Then, the left single channel image was also converted into density image as the target. The S2MS network can be described as:

00078_PSISDG12304_1230425_page_3_2.jpg

where 00078_PSISDG12304_1230425_page_3_3.jpg is the denoising convolutional neural network (CNN) with L inputs, yisum is the clean channel-sum image, nisum is the noise in channel-sum image, yi = {yi1, yi2,···,yiL} are the clean reconstructed images in different single energy channel and ni = {ni1, ni2,···,niL} are the corresponding noise.

After training, S2MS can denoise the PCCT image in single channel. The denoised density image was multiplied by mass attenuation coefficient (Density2Attenuation), converted into a linear attenuation image (Fig. 3).

Figure 3.

The denoised process of the trained S2MS network.

00078_PSISDG12304_1230425_page_4_1.jpg

2.4

Experiment Setup

1) Dataset Establishment

In this study, CT images from the 2016 Low-dose CT Grand Challenge dataset [10] were used to simulate the PCCT images. 1000 slices of 7 patients were randomly divided into training dataset, validation dataset and test dataset according to the ratio of 8:1:1. An equal spatial fan-beam geometry was assumed to simulate the projecting process. The distance from the source to the system origin was 142 cm, the distance from the source to the detector was 180 cm, and there were 512 detector elements with the width of 0.1 cm per element. A total of 512 projections were acquired in an angular range of 360 degrees. The projection data were collected in four different energy bins 30-45keV (channel 1), 45- 60keV (channel 2), 60-80keV (channel 3), and 80-100keV (channel 4). In each energy channel, 1000 PCCT images were acquired which had 512*512 pixels. Poisson noise was introduced in the simulation process. Totally 1×105 photons emitted along each x-ray path and the number of photons per energy channel was proportional to the normalized spectrum of each channel. Finally, the PCCT images can be reconstructed by FBP algorithm. Before training, the reconstructed images divided by the mass attenuation coefficient to convert into the density images which were used as the input and target of S2MS.

2) Network Implementation

U-Net architecture in [7] was used in our study (Fig. 2). The encoder-decoder network includes a shrinking multi-scale decomposition path and a symmetric expansion path, with skip connections on each layer. Adam optimizer was used with a learning rate of 0.0003. The loss function was designed based on Mean Squared Error (MSE):

00078_PSISDG12304_1230425_page_4_2.jpg

where x is input and y is target of the network. The training was performed on a server with Intel Xeon Silver 4214 CPU and GeForce RTX 3090 24G GPU. The network was coded in Pytorch1.9.1 using Ubuntu20.04.

3) Comparison Study

To evaluate the performance of our proposed method, the Noise2Clean (N2C) and traditional Noise2Noise (N2N) were used for comparison. In N2C network, PCCT image reconstructed from the projections without noise was used as the target. In N2N network, two independent projections were simulated and the corresponding reconstructed images were used as input and output, respectively. Since our study focused on a denoising method rather than the network structure, the U-Net architecture in Fig. 2 was also used for N2C and N2N.

4) Evaluation Metrics

In our study, structure similarity (SSIM) and Root Mean Squared Error (RMSE) were used as the evaluation metrics. SSIM measures the structural similarity by comparing both the mean value and distribution relevance between denoised image and reference, and RMSE measures the L2-norm error between the estimated image and the ground truth.

3.

RESULTS

Denoised images generated by our proposed method S2MS, N2C and N2C were shown in Fig. 4. Our proposed method can effectively reduce noise in each PCCT channel image. Especially in channel 3 and channel 4, our proposed method is able to retain richer structural information while suppressing most noise. Four regions of interest (ROIs) were selected (red rectangles) to show the detail preservation performance (Fig.5). In comparison with other methods, the proposed S2MS can remove more noise while preserving more details.

Figure 4.

Four example reconstructed slices in four energy channels (30-45keV, 45-60keV, 60-80keV, 80-100keV). The display windows for linear attenuation from the top to the bottom rows are [0,0.4] cm-1, [0,0.4] cm-1, [0,0.35] cm-1, and [0,0.35] cm-1, respectively.

00078_PSISDG12304_1230425_page_5_1.jpg

Figure 5.

Details of reconstructed images in Fig. 3. The first and second rows are PCCT images in channel 3 (60-80 keV), the third and fourth rows are PCCT images in channel 4 (80-100 keV), and the display window for all images is [0,0.35] cm-1.

00078_PSISDG12304_1230425_page_6_1.jpg

We selected a fixed size (200*200) ROI (blue rectangle in Fig. 4) and calculated the SSIM and RMSE (Table I). Our proposed S2MS achieved the highest SSIM and the lowest RMSE in each energy channel, which indicated that S2MS made a better performance on image denoising for PCCT in comparison with N2N and N2C.

Table 1.

Denoising results of different methods on test dataset.

EVALUATION METRICS
EnergyMethodSSIMRMSE
30-45 keVN2N0.97540.0131
N2C0.97650.0130
S2MS0.98000.0117
45-60 keVN2N0.97380.0087
N2C0.97500.0085
S2MS0.98530.0061
60-80 keVN2N0.96970.0072
N2C0.97190.0070
S2MS0.98250.0048
80-100 keVN2N0.94420.0081
N2C0.94790.0077
S2MS0.97370.0046

To further validate the performance of the proposed method in PCCT clinical application, CT images with lesion is shown in Fig. 6. Denoised reconstructed images in channel 3 (60-80 keV) of different methods are illustrated and the lesion region (red arrow) is magnified. The lesion is hardly observed in the noisy image while it can be clearly in the reconstructed image by S2MS. The lesion area was blurred in the reconstructed images denoised by other methods. This result indicated that our method has potential in clinical application.

Figure 6.

A reconstructed PCCT image in channel 3, the noisy image and the outputs of different network. The shading in the red rectangle is the region of lesion which is magnified and shown. The display window is [0,0.35] cm-1.

00078_PSISDG12304_1230425_page_6_2.jpg

4.

DISCUSSION AND CONCLUSION

In particular, in the output of N2N and S2MS, a gray circular shadow can hardly be seen in the air area (Fig. 7). There is no anatomic structure in this area, which makes the noise distribution and background signal in this part are totally different from those in human part. Therefore, the output of the network may get wrong values in non-human regions. The shadow is hard to see and has little influence in diagnosis.

Figure 7.

The magnified image of ROI (green rectangle in Fig. 3). The display window is [0,0.1] cm-1.

00078_PSISDG12304_1230425_page_7_1.jpg

In conclusion, we have proposed a Noise2Noise-based PCCT image denoising framework via multi-spectral channels (S2MS). In this study, noisy PCCT images were used as both the input and the output to train the network. To make full use of the spectral data in L channels, the reconstructed images in L-1 single channels and channel-sum image were used as the input and the left single channel image was used as the output. Compared with the traditional DL denoising method, simulation results show that the proposed method can obtain a reconstructed image with high quality: noise is reduced remarkably and detail features is well remained. No clean image needed makes the proposed S2MS has potential in practical application. In the future work, our S2MS will be regarded as a priori information to be combined with the material decomposition framework and the experimental data will be used to test the network.

REFERENCES

[1] 

S. Leng, L. Yu, J. G. Fletcher, C. A. Mistretta, and C. H. McCollough, “Noise reduction in spectral CT: Reducing dose and breaking the trade-off between image noise and energy bin selection,” Med. Phys., 38 (9), 4946 –4957 (2011). https://doi.org/10.1118/1.3609097 Google Scholar

[2] 

X. Wu, P. He, Z. Long, X. Guo, M. Chen, X. Ren, P. Chen, L. Deng, K. An, P. Li, B. Wei, and P. Feng, “Multimaterial decomposition of spectral CT images via fully convolutional DenseNets,” J. X-Ray Sci. Technol, 27 (3), 461 –471 (2019). Google Scholar

[3] 

K. Taguchi, C. Polster, O. Lee, K. Stierstorfer, S. Kappler, “Spatio- energy cross talk in photon counting detectors: Detector model and correlated poisson data generator,” Med. Phys., 43 (12), 6386 –6404 (2016). https://doi.org/10.1118/1.4966699 Google Scholar

[4] 

Q. Yang, P. Yan, Y. Zhang, H. Yu, Y. Shi, X. Mou, M. K. Kalra, Y. Zhang, L. Sun, and G. Wang, “Low-dose CT image denoising using a generative adversarial network with Wasserstein distance and perceptual loss,” IEEE Trans. Med. Imag., 37 (6), 1348 –1357 (2018). https://doi.org/10.1109/TMI.2018.2827462 Google Scholar

[5] 

S. Chang, X. Chen, J. Duan, and X. Mou, “‘A hybrid ring artifact reduction algorithm based on CNN in CT images,’,” in Proc. 15th Int. Meeting Fully Three-Dimensional Image Reconstruction Radiol. Nucl. Med., (2019). https://doi.org/10.1117/12.2534726 Google Scholar

[6] 

S. Chang et al., “Spectrum estimation-guided iterative reconstruction algorithm for dual energy CT,” IEEE Trans. Med. Imag., 39 (1), 246 –258 (2020). https://doi.org/10.1109/TMI.42 Google Scholar

[7] 

J. Lehtinen, J. Munkberg, J. Hasselgren, S. Laine, T. Karras, M. Aittala, and T. Aila, “Noise2Noise: Learning image restoration without clean data,” in Proc. Int. Conf. Mach. Learn, 2965 –2974 (2018). Google Scholar

[8] 

D. Wu, K. Gong, K. Kim, X. Li, and Q. Li, “Consensus neural network for medical imaging denoising with only noisy training samples,” in Int. Conf. MICCAI, 741 –749 Google Scholar

[9] 

W. Fang, D. Wu, K. Kim, M. K Kalra, R. Singh, L. Li, and Q. Li, “Iterative material decomposition for spectral CT using self-supervised Noise2Noise prior,” Phys. Med. Biol., 66 (15), 5013 –5030 (2021). https://doi.org/10.1088/1361-6560/ac0afd Google Scholar

[10] 

C. McCollough, “TU-FG-207A-04: overview of the low dose CT grand challenge,” Med. Phys., 43 (6), 3759 –3760 (2016). Google Scholar
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Chaoyang Zhang, Shaojie Chang, Ti Bai, and Xi Chen "S2MS: Self-supervised learning driven multi-spectral CT image enhancement", Proc. SPIE 12304, 7th International Conference on Image Formation in X-Ray Computed Tomography, 1230425 (17 October 2022); https://doi.org/10.1117/12.2647001
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Denoising

X-ray computed tomography

Image denoising

Computed tomography

Signal attenuation

Image enhancement

Sensors

Back to Top