Purpose: We propose a super-resolution (SR) method, named SR-CycleGAN, for SR of clinical computed tomography (CT) images to the micro-focus x-ray CT CT (μCT) level. Due to the resolution limitations of clinical CT (about 500 × 500 × 500 μm3 / voxel), it is challenging to obtain enough pathological information. On the other hand, μCT scanning allows the imaging of lung specimens with significantly higher resolution (about 50 × 50 × 50 μm3 / voxel or higher), which allows us to obtain and analyze detailed anatomical information. As a way to obtain detailed information such as cancer invasion and bronchioles from preoperative clinical CT images of lung cancer patients, the SR of clinical CT images to the μCT level is desired. Approach: Typical SR methods require aligned pairs of low-resolution (LR) and high-resolution images for training, but it is infeasible to obtain precisely aligned paired clinical CT and μCT images. To solve this problem, we propose an unpaired SR approach that can perform SR on clinical CT to the μCT level. We modify a conventional image-to-image translation network named CycleGAN to an inter-modality translation network named SR-CycleGAN. The modifications consist of three parts: (1) an innovative loss function named multi-modality super-resolution loss, (2) optimized SR network structures for enlarging the input LR image to k2-times by width and height to obtain the SR output, and (3) sub-pixel shuffling layers for reducing computing time. Results: Experimental results demonstrated that our method successfully performed SR of lung clinical CT images. SSIM and PSNR scores of our method were 0.54 and 17.71, higher than the conventional CycleGAN’s scores of 0.05 and 13.64, respectively. Conclusions: The proposed SR-CycleGAN is usable for the SR of a lung clinical CT into μCT scale, while conventional CycleGAN output images with low qualitative and quantitative values. More lung micro-anatomy information could be observed to aid diagnosis, such as the shape of bronchioles walls. |
1.IntroductionCurrently, lung cancer is the most common cancer among men,1 and the most common cause of cancer death worldwide.2 In 2020, following the level of female breast cancer diagnoses, an estimated 2.2 million cases of lung cancer were newly diagnosed (11.4% of total new cancer cases). Lung cancer remains the leading cause of cancer death, with an estimated 1.8 million deaths (18% of total cancer deaths).3 Most lung cancers are not found in their early stage, and clinical computed tomography [clinical CT (we use the term “clinical CT image” for CT images that are conventionally taken at hospitals. We use the term “CT volumes” for volumetric images acquired by CT scanning, and we use the term “CT images” for two-dimensioanl (2D) images cropped from CT volumes.)] by volumetric image scanning is offered to patients considered to be at high risk of contracting the disease.4 Clinical CT of lung cancer patients is also used for planning surgery, radiotherapy, and chemotherapy.5 Clinical CT of lung cancer patients provides more detailed images than chest x-rays and is better at finding small abnormal areas in the lungs.6 However, the resolution of clinical CT is still not high enough to observe some micro anatomical structures. We cannot observe enough pathological informations, such as the invasion of cancer, and thin bronchioles, from clinical CT due to its limited resolution (about ).7 To acquire more detailed pathological information for preoperative diagnosis, it is important to enhance the resolution of clinical CT images. Micro-focus x-ray CT () is another CT modality, and it can take images of a much higher resolution than those by CT. Although cannot scan living human bodies,8 it can scan small targets, e.g., a surgically dissected human lung, the entire body of a mouse, or a rabbit heart. Isotropic resolution of volumes is typically or higher. volumes obtained by scanning of resected lung cancer specimens can capture their detailed and surrounding anatomical structures.9 A comparison of clinical CT images with images is shown in Fig. 1. We can clearly observe tumor’s outline and bronchus from , while tumor outline and the bronchus are jagged in clinical CT. If we could enhance the resolution of lung cancer patients’ clinical CT images, we would be able to observe detailed anatomical structures, such as thin bronchioles, and then use the resolution-enhanced clinical CT to guide surgeries and treatment plans for lung cancer. Furthermore, a better resolution may substantially improve automatic detection and image segmentation results.11 Super-resolution (SR) is a term for a set of methods of enhancing the resolution of video or images.12 Our goal is to perform SR of the clinical CT images of lung cancer patients. Deep learning (DL)-based methods for medical image analysis have become active in recent years.13 DL-based methods have achieved state-of-the-art (SOTA) accuracy14–18 over traditional methods in segmentation. DL-based methods also achieved SOTA in medical image denoising.19,20 Following this trend, we also use DL-based methods for performing SR in this paper. Previous SR methods based on DL21–25 commonly needed aligned pairs of low-resolution (LR) and high-resolution (HR) images to train a fully convolutional network26 for SR. Dong et al.21 proposed a deep neural network-based SR method for single-image SR. Ledig et al.22 proposed a generative adversarial network (GAN) for photorealistic SR. Lim et al.23 proposed an enhanced deep residual network27 for SR. Haris et al.24 proposed a network that exploits iterative up- and down-sampling layers for SR. Wang et al.25 proposed a dual-stream network for SR. There are also several approaches to the SR of CT images.28–30 Yu et al.28 proposed a single-slice and multi-slice SR method for CT images. Georgescu et al.30 proposed a two-stage network for the SR of CT and MRI images. However, a common disadvantage of the above methods21–25,28–30 is that they require paired LR-HR images for training. LR images are acquired by downsampling the HR images using interpolation algorithms such as bicubic interpolation.31 It is difficult to perform the SR of lung clinical CT images using these previous methods. Given a clinical CT image (regarded as LR image here) with a resolution of around , we cannot acquire its corresponding HR image because it is difficult to scan a living human body at a higher resolution. On the other hand, we can obtain images having a micro-level resolution by scanning resected lung specimens. We can use images of lung specimens to guide the SR of lung clinical CT images. Since lung clinical CT and are acquired from different imaging devices, image registration of lung clinical CT and images is needed to obtain paired LR (clinical CT)-HR () images of the lung. However, registration between clinical CT and is challenging because the shape and inflation status of lung specimens in images are very different from those of a living lung. Therefore, an unsupervised method that does not require pairs of clinical CT and images is desired. However, there are very few unsupervised SR methods that do not require paired LR and HR images. Yuan et al.32 proposed an unsupervised method for single-image SR. However, this method is improper for processing medical images due to its unstable training process and excessive training time. Ravì et al.33 proposed an unsupervised SR method for endomicroscopy; however, this method requires certain hardware parameters for the endomicroscopy imaging device. Accordingly, there is demand for stable, time efficient, and highly versatile unsupervised SR method. This paper proposes SR-CycleGAN, an unsupervised SR method that does not require paired LR-HR images to perform the SR of lung clinical CT images. First, we introduce a novel loss function named multi-modality super-resolution (MMSR) loss for preventing intensity variation of an SR image from the original domain (clinical CT) into the HR domain (). Second, we design an optimal and time-saving network structure for SR. To prove our method’s effectiveness, we built a clinical- database for our experiments and evaluated our method using this database. To the best of our knowledge, our method is the first approach to perform the SR of clinical CT using . The contributions of our method are: (1) a novel loss function named MMSR loss for cross-modality SR from clinical CT to scale, (2) a specially designed SR network structure for shortening training time and enhancing accuracy, and (3) a newly built clinical dataset for verifying the feasibility of our proposed cross-modality SR method. Our code is available at https://github.com/zhuofeng/SR-cycleGAN. 2.Method2.1.OverviewWe propose an unsupervised method for performing the SR of clinical CT to the -scale, using unpaired clinical images for training. We call our method SR-CycleGAN, since the structure of SR-CycleGAN is based on CycleGAN. The novelty of SR-CycleGAN consists of three aspects: (1) a network for SR, where the image-to-image translation networks of conventional CycleGAN were replaced by SR networks. The output SR image size is -times () larger than the input LR image. (2) A loss function named MMSR loss, which ensures that the output SR image has the same structure as that of the input LR image. (3) An optimized network structure for reducing training time and achieving better quantitative/qualitative results. For training, our method requires clinical CT images and images. Inputs of the network are 2D CT images (LR images) cropped from clinical CT volumes. Outputs are corresponding SR images. It is noteworthy that the height and width of SR images are -times () larger than those of the LR image. 2.2.Conventional CycleGANThis section explains conventional CycleGAN to better understand our SR-CycleGAN. CycleGAN34 is an unsupervised image-to-image translation method based on deep generative models. It can learn to translate an image from a source domain to a target domain in the absence of paired examples. The mathematical idea of CycleGAN is to obtain a generator : and another generator : . At the training stage of CycleGAN, the generators and are trained simultaneously, and a loss named cycle-consistency loss is adopted to maintain cycle-consistency and . Here, and are the images from domain and domain , respectively. The cycle-consistency loss is formulated as where is the -norm. Furthermore, to generate more realistic images, a CNN-based discriminator is used to distinguish generated images and real images . In addition, another generator is used to distinguish generated images and real images . Accordingly, generators and are trained to fool the discriminators and . Moreover, and will help generators and to generate images that are closer to the target domain. Achieving this objective of generating more realistic images involves loss terms named adversarial losses. The adversarial losses are formulated asThe combination of adversarial losses and cycle-consistency loss is used for the unpaired image-to-image translation in CycleGAN. 2.3.SR-CycleGANThe conventional CycleGAN is not designed for SR. Since CycleGAN is an image-to-image translation network, output and input images are of the same size. However, in performing the SR of a given image, the output image’s size is larger than the input image, since the output image’s resolution is higher than that of the input. Furthermore, CycleGAN faces problems such as providing diverse outputs.35 In the SR of medical images, we desire an output image that has the same anatomical structures as the input image. The SR result of a bronchus should still have the shape of a bronchus. Due to such constraints, we propose an SR network based on CycleGAN, and we named our method SR-CycleGAN. The structures of CycleGAN and SR-CycleGAN are shown in Fig. 2. Here, the input size and output size of CycleGAN are the same, but the output size is larger than the input in SR-CycleGAN. 2.3.1.Network structure of SR-CycleGANThe specific network structure of SR-CycleGAN is shown in Fig. 3. As shown in Fig. 3(a), we modified conventional CycleGAN’s image-to-image translation neural network (generator) to an SR neural network by removing downblocks/upblocks (definitions of downblocks/upblocks are given in Fig. 3) and adding pixel-shuffling layers. In conventional CycleGAN, the input and output of are of the same size. We input an image with a size of pixels into of CycleGAN. Then we obtained the same-sized image of pixels as output. On the other hand, by inputting an image with a size of pixels into of SR-CycleGAN, we obtained an image of () pixels as output. The original network structure of generator has three “downblocks” at the network’s beginning, as shown in Fig. 3. Each downblock contains a convolution layer that scales down the image to 1/2 of its original size, following a batch normalization layer and an activation layer. If we input an image of into three downblocks, we would obtain feature maps of . Such small feature maps would wash away the spatial features of the given image. Therefore, we remove the downblocks of the generator . Upblocks consist of deconvolution layers that scale up the feature maps to their original size in generator of CycleGAN. Since we remove the downblocks in SR-CycleGAN, the feature maps are no longer scaled-down, and thus we also remove the upblocks in SR-CycleGAN. Finally, SR-CycleGAN is an SR network. Thus, we need to scale up feature maps at the end of the network to obtain the SR image. Use of a sub-pixel shuffling layer has been proven to reduce computational complexity, save computing time, and perform significantly better than using a deconvolution layer in SR operation.36 Therefore, we add sub-pixel shuffling layers at the end of the network for scaling up feature maps to obtain the SR image as shown in Fig. 3(a). In SR-CycleGAN, generator is an inverse function of generator . Since generator scales up an input image to an SR image, we modified the generator to scale down an HR image to an LR image. In conventional CycleGAN, an image with a size of () pixels is input into , and an image of the same size is produced as output. On the other hand, in generator of SR-CycleGAN, we obtain an image of as output from an input image of size (). We added downblocks consisting of downsampling layers at the end of generator to scale down the feature maps, as shown in Fig. 3(b). 2.3.2.Multi-modality super-resolution loss in SR-CycleGANThere are two important factors in the SR of clinical CT images. One is anatomical structure, and the other is intensity distribution. Here, we explain the relationship between anatomical structure and intensity distribution. Structures such as arteries, bronchi, and alveoli are anatomical structures. Intensity distribution describes how a certain tissue has a certain intensity (grayscale). The intensity of clinical CT is described by the Hounsfield scale, and a specific substance such as bone has a specific intensity of to .37 On the other hand, the intensity of changes with every scan, so the intensity of a specific substance varies slightly at each time of scan. The same anatomical structures have totally different intensity distributions between clinical CT and . For instance, in clinical CT images, the intensities of blood vessels and bronchus walls are around 0 and Hounsfield units (H.U.). In images, the intensities of blood vessels and bronchus walls are around 15,000 and 11,000 in the scanner used in our experiments. The intensity distribution of focuses on a range of about [2000, 15,000] as shown in Fig 4(b), while the intensity of a lung’s clinical CT is distributed relatively uniformly in the range as shown in Fig. 4(a). Even if we normalize the intensities of both and clinical CT to the range , the histograms of the two intensity distributions are still very different. For the SR of medical images, a drastic change in image appearances may mislead clinicians. We need anatomical structures such as blood vessels and bronchi in clinical CT images (LR image) to maintain their original size and shape after SR. In addition, we have to ensure that the intensity distribution of the clinical CT’s SR result stays close to that of the original clinical CT image. The loss function used in conventional CycleGAN does not ensure that input LR and output SR images have the same anatomical structures and intensity distribution. If we only modify the network structure of CycleGAN as shown in Sec. 2.3.1, the modified network outputs SR images with totally different intensity and anatomical structures from the input LR image. The objective of conventional CycleGAN is to output images close to the target domain instead of the source domain. In clinical CT image SR, the source domain is the LR domain (clinical CT) and the target domain is the SR domain (). Therefore, CycleGAN with conventional loss terms outputs SR images with no similarity to the input LR image. Loss terms that guarantee that the output SR image has the same anatomical structures and intensity distribution as the input LR image are desired. We propose a novel loss function named MMSR loss as shown in Fig. 5. The MMSR loss contains the following terms: (1) structural similarity (SSIM) loss, (2) downsample loss, and (3) upsample loss. As shown in Fig. 5, the downsample loss and upsample loss ensure that the SR image has a similar intensity distribution to that of the input LR image, and the SSIM loss ensures that the SR image has similar anatomical structures to those of the input LR image. Consequently, we use the MMSR loss to train SR-CycleGAN. SSIM lossThe first loss term we propose is named SSIM loss. SSIM38 is an indicator that evaluates the structure similarity of two images. SSIM between two images is defined as where and are the average intensity of given images and , respectively. and are the variance of given images and , respectively. is the covariance of given images and . and are constant numbers included to avoid instability. Based on this equation, we set the loss term named SSIM loss as where is an input clinical CT image, is the SR image, is the domain of clinical CT images, and is the average pooling39 function. Average pooling calculates the average value for patches of a feature map and uses it to create a downsampled (pooled) feature map.40 rescales a given image to of its original size by width and height. We use as the basis of this loss term, since we desire the SSIM of and to be close to 1.Downsample lossTo prevent a change of intensity in the CT image after SR, we propose another loss term named the downsample loss, which is written as where is the square of the -norm, is the input clinical CT (LR) image, and is the SR image. We call this the downsample loss because it is calculated using the downsampled SR image and the input LR image . Since the downsample loss calculates the pixel-wise loss between the SR and LR images, this loss can prevent the SR image from deforming and changing of its intensity in relation to the LR image.Upsample lossThe third proposed loss term is named upsample loss. As shown in Fig. 5(b), in SR-CycleGAN, there is another generator that can translate a given image into a clinical CT-like image . By the same principle as downsample loss, to prevent a change in the intensity between and , the upsample loss is formulated as where is the nearest upsampling function. The nearest upsampling function selects the value of the nearest pixels of a feature map, and then assigns this value to new pixels to create an upsampled feature map. rescales a given image to () times its original size by width and height, and is the domain of images . We call this the upsample loss because it is calculated from the norm between the upsampled fake clinical CT and the original .Adding MMSR loss in SR-CycleGANThe MMSR loss consists of SSIM loss, downsample loss, and upsample loss. The MMSR loss is formulated as where is the SSIM loss between the input clinical image and the output SR image . is the SSIM loss between the image and the generated clinical CT-like image . is the downsample loss of and . is the upsample loss of and . is the average pooling function that scales up a given image. is the nearest upsampling function that scales down a given image. , , , and are weights. We add the proposed MMSR loss as an additional loss term into the proposed SR-CycleGAN. We formulate the total loss function of SR-CycleGAN as where and are GAN loss, and is cycle-consistency loss proposed in the conventional CycleGAN described in Sec. 2.2. , , and are weights. By adding the MMSR loss to CycleGAN, we successfully performed the SR of clinical CT of lung cancer patients to the level, while conventional CycleGAN failed to perform SR.2.4.Training and Inference of SR-CycleGANIn the training phase, the input of generator is a clinical CT image with the size of pixels. We denote the clinical CT image as . The generator generates an SR image with a size of pixels. On the other hand, a image with a size of pixels is input into the generator . The generator generates a clinical CT-like image of pixels from the image of pixels. The loss of the entire SR-CycleGAN is calculated from , , , and . Then the loss is used for to optimize the network. For inference, we only use the trained generator . We extracted images of size pixels from clinical CT and input them into the trained network . The output is SR images of size pixels. 3.Experiments and Results3.1.DatasetsIn our experiments, we newly built a dataset containing ten volumes and eight clinical CT volumes. The clinical CT volumes were scanned by a clinical CT scanner (SOMATOM Definition Flash, Siemens Inc., Munich, Germany). The resolution of the clinical CT volumes was . The size of the clinical CT volumes was . The volumes were scanned by a scanner (inspeXio SMX-90 CT Plus, Shimadzu Inc., Kyoto, Japan) as shown in Fig. 6(a). The lung cancer specimens were fixed by Heitzman’s method41 as shown in Fig. 6(b). Lung specimens were scanned at isotropic resolutions of . The size of the volumes was . We trained SR-CycleGAN using five clinical CT volumes and five corresponding volumes of lung cancer specimens. We evaluated the SR-CycleGAN qualitatively on three clinical CT volumes and quantitatively on five volumes. These clinical and volumes were not used for training. 3.2.PreprocessingChest clinical CT images have various tissues outside the lungs that are not appropriate for our experiments, such as bones, muscles, esophagus, etc. We first segmented lung regions from clinical CT chest images. We conducted region growing42 to obtain a coarse segmentation mask of the lung and performed morphological operations to fill the holes in the coarse segmentation mask. images also require a target region restriction. In our experiments, lung specimens were placed in a plastic cylinder and put into the scanner for scanning. Therefore, parts of the plastic cylinder are shown in the images. Since the plastic cylinder is not suitable for our experiment, we manually cropped lung regions from the images, and only used the lung regions for the experiment. In addition, normalization of the intensities of both clinical CT and images was required. We normalized both the intensity of and clinical CT to the range . In clinical CT, the intensity of a tissue is represented using the Hounsfield scale, with water having a value of 0 H.U., tissues denser than water having positive values, and tissues less dense than water having negative values.43 In , the intensity is not represented by Hounsfield scale. The intensity range of the clinical CT volume was about 3500 H.U. (intensity of air is around H.U. and intensity of bone is around 2500 H.U.), but the scale of the volume was about 16,000 (intensity of air is around to 0, and cancer is around 15,000). For clinical CT, we normalized the intensity in this way: For intensity larger than 2500 H.U. (larger than the bone intensity), we set the intensity to 2500 H.U. We also set voxels that have intensity smaller than H.U. to H.U. For , we set voxels that have intensity higher than 15,000 (higher than cancer) to 15,000 and set voxels that have intensity smaller than 0 to 0. Finally, the intensities of both clinical CT and images were compressed to . 3.3.Parameter Settings3.3.1.SR rate and training patch numbersConventionally, SR was conducted () times, which means the SR image was () times larger than the LR image. Considering the resolution of clinical CT volumes (625 mm) and volumes (52 mm), we chose SR. In the training phase, we extracted 2000 patches with a size of randomly from each clinical CT case. We also extracted 2000 patches of the size of randomly from each case. Since we had five cases for training, the total numbers of clinical and patches were both 10,000. 3.3.2.Parameters for network trainingWe used Adam44 for stochastic optimization of the network. We set the learning rate to , while the training rate remained from 1 to 100 epochs, and decayed linearly from to 0 between 100 to 200 epochs. The mini-batch size of training was 4. Training was continued until 200 epochs. We manually chose weights of each loss term that could obtain the best qualitative results on the training dataset. Weights of each loss term are listed in Table 1. All networks were implemented by PyTorch. Table 1Parameters of each loss term.
3.3.3.Evaluation methodsFor qualitative evaluation, we utilized three clinical CT volumes. We cropped clinical CT images of size from three clinical CT volumes and input the clinical CT images into generator of trained SR-CycleGAN. Then, we obtained SR images of size . For demonstrating the effectiveness of network modification and MMSR loss of SR-CycleGAN, we compared SR-CycleGAN with conventional CycleGAN. Since input and output of CycleGAN is of the same size, CycleGAN could not be applied directly for SR. Therefore, we add upblocks into CycleGAN’s generator to ensure output of is eight times larger than input (by width and height). We name this CycleGAN as “CycleGAN with upblocks.” We also conducted ablation experiments to verify the effectiveness of network modification. For quantitative evaluation, we proposed a novel quantitative evaluation method. In previous supervised SR studies,45 quantitative evaluations were often conducted by comparing the output SR image with its HR counterpart. Therefore, paired LR images (clinical CT images) and HR images ( images) were required for quantitative evaluations. Since we could not obtain paired clinical images, we conducted an alternative approach: First, we used bicubic interpolation31 to downsample images to 1/8 of their original size to simulate clinical CT images (In image processing, bicubic interpolation is used for interpolating data points on a 2D regular grid. Bicubic interpolation considers 16 pixels () around the pixel to be interpolated and calculates a weighted addition of these 16 pixels as the new pixel.). For a given image of , we performed bicubic downsampling of the image to obtain an image size of and then input it into trained to obtain a SR output. We compared the SR output with the original images using evaluation metrics such as peak signal-noise ratio (PSNR).46 It is noteworthy that is trained by clinical CT and images as explained in Sec. 3.3.1. We used five cases of 1544 images for quantitative evaluation. We compared the following networks. Network1: CycleGAN with upblocks (no MMSR loss, no network modification, only upblocks for a larger output image). Network2: CycleGAN with network modification (sub-pixel shuffling layers but no MMSR loss). Network3: SR-CycleGAN with downblocks (with MMSR loss, no sub-pixel shuffling layers). Network4: Proposed SR-CycleGAN (with MMSR loss and sub-pixel shuffling). We also quantitatively evaluated how sub-pixel shuffling layers reduce training time. Before adding sub-pixel shuffling layers in generator , we used upblocks to upscale the feature maps to a larger size. Figure 7 shows a comparison of with/without pixel-shuffling layers. We used 2000 patches cropped from clinical CT images of and 2000 patches cropped from images of SR-CycleGAN for training. 3.4.Comparison of ResultsSR results of SR-CycleGAN were compared with CycleGAN with upblocks in Fig. 8. Furthermore, for evaluating the effectiveness of removing downblocks and introducing sub-pixel shuffling layers, we also evaluated SR-CycleGAN with/without removing downblocks and with/without sub-pixel shuffling layers as shown in Fig. 9. 3.4.1.Qualitative evaluationWe show the cropped part of the SR images obtained by the SR-CycleGAN in Fig. 8(c). The results of CycleGAN with upblocks are shown in Fig. 8(b). In SR results of SR-CycleGAN, lung anatomies, such as the bronchus, appear more clearly than the original clinical CT images as indicated by red arrows in Fig. 8(c). CycleGAN with upblocks (no network modification except adding upblocks and no MMSR loss) only produced results that have no similarity with the input LR image (clinical CT image). Important anatomical structures such as the blood vessels and bronchus disappeared, as indicated by red arrows in Fig. 8(b). The results demonstrate that the proposed SR-CycleGAN is suitable for SR of clinical CT images. The results of “SR-CycleGAN with downblocks”47 (SR-CycleGAN with MMSR loss but without network modification) are shown in Fig. 9(b), which seems noisy, and the edge of the blood vessel and bronchus has many artifacts indicated by red arrows. The results of SR-CycleGAN are shown in Fig. 9(c), which is clearer and noiseless compared with Fig. 9(b). To observe SR results from a larger scale, we illustrate both clinical CT images of the whole lung region and images cropped from the lung region before and after SR in Fig. 10. 3.4.2.Quantitative evaluationThe SR results and quantitative evaluation results are shown in Fig. 11 and Table 2. We used PSNR and SSIM46 for quantitative evaluation. Table 2 shows that the proposed SR-CycleGAN performed quantitatively better than other methods, with the highest PSNR and SSIM. Table 2Quantitative evaluation of our methods. Network1: CycleGAN with upblocks (no MMSR loss, no network modification, only upblocks for larger output image). Network2: CycleGAN with network modification (sub-pixel shuffling layers but no MMSR loss). Network3: SR-CycleGAN with downblocks (with MMSR loss, no sub-pixel shuffling layers). Network4: proposed SR-CycleGAN (with MMSR loss and sub-pixel shuffling). Bold values are the highest.
We also evaluated how sub-pixel shuffling layers reduce training time. SR-CycleGAN without sub-pixel shuffling layers needs 491 s for training per epoch (2000 patches per epoch). After replacing upblocks with sub-pixel shuffling layers, the entire network needs 353 s for training per epoch. Thus, training time was significantly reduced. The network was trained on Nvidia Tesla V100 (32 GB memory). 3.5.Ablation StudiesFor accessing the effectiveness of different components of our method, we performed ablation studies. On top of baseline (CycleGAN with upblocks), we progressively added network modification and the MMSR loss function. Further, to clear effectiveness of each component of MMSR loss, we also analyzed each term in MMSR loss separately. Experiments showed that our method with all proposed components performed best quantitatively and qualitatively. 3.5.1.Effectiveness of network modificationWe first analyzed the effect of network modification. As network modification, we removed downblocks and added pixel-shuffling layers to a conventional CycleGAN’s generator . Network modification avoided the need to encode the input image into smaller feature maps, thus preserving spatial information while performing SR. Additionally, it also reduced training and referencing time. With network modification, PSNR increased by 1.75 dB and SSIM increased by 0.32 compared to the baseline (CycleGAN with upblocks). The qualitative results of baseline and baseline with network modification are shown as condition A and condition C, respectively, in Fig. 12; images of the latter were qualitatively better than those of the former. Quantitative results of network modification are shown in Table 3. In Table 3, the PSNR and SSIM score of condition C (baseline with network modification) are higher than those of condition A (baseline). Therefore, network modification is required in our method. Table 3Ablation studies and quantitative results. SSIM loss 1 is LS(x,f↓(xSR)) and SSIM loss 2 is LS(y,f↑(yLR)). Applying network modification and all loss terms simultaneously obtains the highest PSNR and SSIM means such a component is not utilized, and F0FC means such a component is utilized. Bold values are the highest.
3.5.2.Effectiveness of MMSR lossWe analyzed the effectiveness of the proposed MMSR loss. The MMSR loss ensures that the output SR image has similar pixel-wise intensity distribution to that of the input LR image. The MMSR loss also prevents the network from generating arbitrary outputs. With the MMSR loss, PSNR increased by 2.84 dB; SSIM increased by 0.39 compared to the method without MMSR loss. We further studied the effectiveness of each loss term in the MMSR loss. The MMSR loss contains the following components: (1) SSIM loss (containing two loss terms), (2) downsample loss, and (3) upsample loss. Upsample loss and downsample loss ensure that the output SR image has a higher pixel-wise similarity with the input image. SSIM loss ensures that the output image has a higher structural similarity48 with the input image. We studied various combinations of loss terms and show their quantitative results in Table 3. In Table 3, each loss term in MMSR loss brought an increase in PSNR and SSIM, and the SSIM loss (containing two loss terms) brought more improvement than other loss terms (condition I in Table 3). We chose four combinations of loss terms (conditions A, H, I, and M in Table 3) whose qualitative results have huge differences. The qualitative evaluation results of these four combinations are shown in Fig. 12, which shows that our method’s output (condition M) has the highest similarity with the HR image (ground truth), compared with the other combinations of loss terms (conditions A, C, H, and I). 3.6.Comparison with Recent BaselinesWe compared our method with three recent SR methods. We first compared our method with a recent unsupervised baseline named CinCGAN.32 CinCGAN first utilizes cycle-in-cycle network structure to map a noisy and blurry LR image to a noise-free LR image. Then the noise-free LR image is upsampled with a pre-trained deep SR model. CinCGAN is trained with LR-HR images in an end-to-end manner. The trained CinCGAN is used for performing SR of a given LR image.32 We also compared our method with a newly proposed SOTA unsupervised SR method named pseudo-SR,49 and a widely used supervised SR method named ESRGAN.23 Pseudo-SR is an SR method consists of an unpaired kernel/noise correction network and a pseudo-paired SR network. The correction network removes noise and adjusts the blurring kernel50 of the input LR image. Then the pseudo-paired SR network upscales the corrected clean LR image.49 ESRGAN is a supervised SR method utilizing newly proposed loss terms such as adversarial loss and perceptual loss, and the residual-in-residual dense block into SR network.51 We did not have paired clinical CT (LR) and (HR) images. Therefore, we trained ESRGAN with unpaired LR-HR images. The results of our method and these recent baselines were shown in Fig. 13. As shown in the red boxes in Fig. 13, our method output SR images close to the HR images (ground truth). Recent SR baselines output SR images quite different from the HR images (ground truth). The PSNR and SSIM of our method were the highest among all methods, as shown in Table 4. We also compared our method’s inference time, training time, and parameter size with recent baselines in Table 5. As shown in the Table 5, training time for one epoch was the shortest with our method, and the number of network parameters was the smallest. Table 4Quantitative comparison between our method and recent baselines. Our method has the highest PSNR and SSIM score. These results were computed on the clinical CT−μCT dataset. Bold values are the highest.
Table 5Comparison of training time, inference time and number of parameters between our method and recent baselines. Our method has the shortest average training time and the fewest parameters compared to recent SR baselines ESRGAN,51 pseudo-SR,49 and CinCGAN.32 Bold values are the highest.
3.7.Experimental Results on COVID-19 Lung CT Segmentation Challenge—2020 DatasetWe also performed an experiment with an additional benchmark CT dataset to examine whether our method could perform SR of commonly used medical images (such as CT images). We chose the COVID-19 Lung CT Segmentation Challenge—2020 dataset.52 This dataset has 249 cases collected from patients of different hospitals, countries, ages, and genders. Here, 199 cases were for training and 50 cases were for testing. We chose SR (width and length of an output image are four-times those of an input image). Input LR image size was , and output SR image size was . We compared our method with recent baselines: unsupervised SR methods CinCGAN32 and pseudo-SR,49 and a supervised method ESRGAN.51 Qualitative results are shown in Fig. 14, and quantitative results are shown in Table 6. Our method outperformed these recent baselines quantitatively as shown in Table 6. It could output clear images and reconstruct important anatomical structures such as vessels and bronchi. Results of recent baselines are blurred (CinCGAN and pseudo-SR) or unreasonable (ESRGAN) in Fig. 14. The experimental results prove that our method is effective on commonly used medical images. Table 6Quantitative comparison between our method and recent baselines on the COVID-19 Lung CT Segmentation Challenge—2020 dataset.52 Bold values are the highest.
4.Discussions4.1.Unsupervised SR of Clinical CT Utilizing DataTo the best of our knowledge, our method is the first method to perform SR on clinical CT to the scale without a corresponding HR image as ground truth. The method is also the first to perform SR of clinical CT utilizing data. MMSR loss and modification of networks enabled SR-CycleGAN to perform SR by forcing SR images to have the same anatomical structures as the input clinical CT (LR) images. We believe MMSR loss is more important than network modification, since in Fig. 8(b), CycleGAN with upblocks (no MMSR loss, no network modification, only upblocks for larger output image) output results that do not have similarity with the input images. As shown in Fig. 9(b), SR-CycleGAN with downblocks (with MMSR loss, no network modification) performed SR of clinical CT images. However, these results were not as good as SR-CycLeGAN with sub-pixel shuffling (with both MMSR loss and network modification) in Fig. 9(c). MMSR loss enabled SR of clinical CT images, and modification of the network enhanced the qualitative and quantitative results. 4.2.Effect of Hyperparameter AdjustmentWe performed further experiments to address the effect of different hyperparameters on the final result. Specifically, we changed the number of Resblocks, the convolution kernel size, and the patch size for training. We showed the number of Resblocks, the convolution kernel size, and the patch size utilized in our method in Fig. 15. First, we changed the number of Resblocks. The number of Resblocks in generator of our method was 9. Since we built our method based on CycleGAN, whose numbers of Resblocks were 6 (for small patches) and 9 (for large patches), we performed an experiment with a smaller number of Resblocks 6. In addition, since the difference between 9 (number of Resblocks in our method) and 6 (the smaller number of Resblocks) was 3, we further performed an experiment with a larger number of Resblocks of . Furthermore, we performed an experiment with a larger or smaller convolution kernel. The first Conv+BN+ReLU block in generator of our method utilized a convolution kernel of size ; the second Conv+BN+ReLU block utilized a convolution kernel of size . We changed the first Conv+BN+ReLU block’s convolutional kernel size to to test the effect of a larger convolution kernel. Correspondingly, we changed the second Conv+BN+ReLU block’s convolutional kernel size to to test the effect of a smaller convolution kernel. The patch size for training was also adjusted. The input patch size in our method was . We tried using smaller () and larger () patch sizes to investigate the impact of patch size on the results. Table 7 shows that using 9 Resblocks, and convolution kernel sizes, and patch size led to the highest PSNR and SSIM score. Using either more or fewer Resblocks, larger or smaller convolution kernel size, or larger or smaller patch size resulted in a lower PSNR and SSIM score. Qualitative results of different hyperparameters were similar, as shown in Fig. 16. It is obvious that the parts enclosed in the red boxes in Fig. 16 do not have significant differences. In conclusion, the experimental results show that our method’s number of Resblocks, convolution kernel sizes, and patch size resulted in the best quantitative result as shown in Table 7. Additionally, the number of Resblocks, convolution kernel sizes, and patch size do not have much effect on the qualitative results as shown in Fig. 16. Table 7Different hyperparameters result in different experimental results. Experimental results showed that using nine Resblocks, 3×3 and 7×7 convolution kernel sizes, and 32×32 pixels patch size results in the best PSNR and SSIM score. The red characters in each condition indicate its difference with condition 1. Bold values are the highest.
4.3.Novelty of Our Method and Difference from Recent CT SR MethodsOur method has three novel components: (1) a lightweight network equipped with sub-pixel shuffling layers,36 (2) novel loss terms named upsample and downsample losses, and (3) a novel loss term named SSIM loss. We modified components (1), (2), and (3) in applying them to our task. We added component (1) in CycleGAN to apply component (1) in unsupervised scenarios. Although components (2) and (3) have been used as loss terms in some SR methods,53 they were never used to measure the similarities of different-size images (e.g., one image size of and another of ). We modified components (2) and (3) to measure the similarities of differently sized images and utilized the similarities as loss terms to optimize our proposed network. No existing CT SR method utilizes components (1), (2), and (3) at the same time. By combining components (1), (2), and (3) in our method, we successfully implemented unsupervised SR with a relatively lightweight network. As a result, our method successfully achieved SR on a clinical dataset, which cannot be attained by recent CT SR methods. Here, we compare the MMSR loss with other loss terms proposed in previous methods, and discuss about the necessity of the MMSR loss. A relevant work named GAN-CIRCLE29 used adversarial loss, cycle-consistency loss, identity loss, and joint sparsifying transform loss to indirectly promote the consistency between input LR and output SR image. In contrast, our method imposes the MMSR loss to directly constrain input LR and output SR images have higher SSIM and pixel-wise similarity. In our newly built clinical dataset, LR and HR images have huge intensity and structural difference. Therefore, if we train SR methods without directly constraints between input LR and output SR images on our clinical dataset, the trained network tends to output SR images that is totally different from input LR images, such as results of pseudo-SR in Fig. 13. In contrast, using the MMSR loss, our method obtained satisfying qualitative and quantitative results. Another relevant network named CinCGAN32 uses modified identity loss and modified TV loss to ensure SR network’s output has higher pixel-wise similarity with input. However, CinCGAN only calculates the modified identity loss between input LR and output SR image. On the other hand, our method calculates MMSR loss from (1) input LR and output SR image and (2) HR image and corresponding synthesized LR image. Moreover, our MMSR loss is proposed based on two evaluation metrics: MSE and SSIM. Our method showed better performance than CinCGAN on MSE-based (PSNR) and SSIM-based evaluation metrics. We can further differentiate our method from recent supervised and unsupervised CT SR methods. Recent supervised CT SR methods, such as ESRGAN for CT SR,54 require pairs of LR-HR images for training. In contrast, our method does not need any paired LR-HR images for training. Some image denoising methods could be applied in SR.55 GAN with network-in-network structure embed with skip connection naming deep convolutional generative adversarial network (DCSWGAN)20 was proved to be effective in CT image denoising. The generator of DCSWGAN consists of convolutional blocks, and each convolutional block consists of convolutional layer, bias, and leaky rectified linear unit, which is similar to our method’s generator . The generator of DCSWGAN uses a cascade structure containing two subnetworks, one is a feature extraction network, the other is a reconstruction network. In contrast, our method only uses one network for SR. A disadvantage of DCSWGAN is that it still needs paired images for training. You et al. proposed an unsupervised SR method for CT and MRI images named GAN-CIRCLE,29,56 and further applied to bone micro structure reconstruction57 and brain MRI reconstruction.58 GAN-CIRCLE performed SR (resolution of output SR image is two times of input LR image). On the other hand, we desire an SR method which performs SR of clinical CT images to scale. Our method achieved 8× SR (SR from to ). Moreover, unsupervised SR methods such as CinCGAN32 and GAN-CIRCLE29 can only perform SR between images of the same modality (e.g., LR MRI images to HR MRI images); consequently, the LR and HR images do not have huge differences aside from resolution. Therefore, recent SR methods performed poorly on our clinical dataset, since our HR () and LR (clinical CT) images are from totally different modalities. 4.4.Analysis of Parameter Selection of Loss TermsHere, we analyze the parameter selection of each loss term and discuss how assigning weights to each loss term leads to the best results. The overall loss function is composed of three terms: (1) SSIM loss, (2) downsample loss, and (3) upsample loss. Various combinations of loss terms lead to different quantitative results, as shown in Table 3. Table 3 shows that each loss function contributes to the final result. SSIM loss (containing two loss terms) brings the highest PSNR and SSIM score improvement. While the method is already equipped with SSIM loss, downsample loss and upsample loss can still improve PSNR and SSIM score slightly. Therefore, we believe that a higher weight of SSIM loss together with smaller weights of downsample loss and upsample loss brings the highest PSNR and SSIM score. 4.5.Effect of Downblocks in SR-CycleGANWe performed experiments to verify the effectiveness of removing downblocks and adding pixel-shuffling layers in generator . As shown in Fig. 9, the SR results obtained by generator with downblocks and without pixel-shuffling layers [Fig. 17(a)] look blurred and noisy, while the SR results obtained by generator without downblocks and with sub-pixel shuffling layers [Fig. 17(b)] look clearer. This is because downblocks scale down the input images to a smaller size. Input images have ; downblocks scale down the input images into feature maps of , and such small feature maps destroy spatial information in the input image. Furthermore, generator with downblocks [Fig. 17(a)] is deeper than generaor without downblocks [Fig. 17(b)]. Previous research affirmed that deeper stages of neural networks are more semantic but spatially coarser.59 Thus, the shape of essential anatomical structures such as the bronchus are likely to deform in the SR result, as shown in Fig. 9(b). 4.6.Effect of Reducing Computing Time Using Sub-Pixel Shuffling LayersThe sub-pixel shuffling layers were proved to shorten computing time, compared with upblocks.36 We replaced upblocks with sub-pixel shuffling layers in the proposed SR-CycleGAN. In Fig. 7, two kinds of network structures for generator are compared. The experimental results show that training time was significantly reduced from 491 to 353 s for training per epoch (2000 patches). For handling large-scale networks, such as CycleGAN, reducing computing time is an important issue. Introducing sub-pixel shuffling layers saved computing resources without loss of accuracy. 4.7.Difficulty of Quantitative EvaluationIn conventional SR methods, quantitative evaluation is typically conducted by comparing SR and HR image pairs. However, it is infeasible to obtain such pairs between clinical CT and images, as mentioned in Sec. 1. To perform quantitative evaluation, we used downsampled images instead of clinical CT images. We input the downsampled image into trained generator and then obtained the SR result of downsampled from . Next, we compared the SR result with the original images. We used PSNR to compare the SR image and the original image. Since images and clinical CT images have the same anatomical structures (bronchi and arteries), downsampled images can simulate clinical CT images to a certain extent. However, downsampled images cannot simulate clinical CT images perfectly because the imaging conditions of and clinical CT are different. For a specific tissues such as the bronchus in clinical CT, intensity is around to 200 H.U. On the other hand, the intensity of the bronchus in is around 6000 to 14,000 H.U. Furthermore, lung specimens for scanning images are resected from part of the lung, so the images of lung specimens do not contain anatomical information of the whole lung. Hence, we cannot simulate clinical CT perfectly by downsampling images to the clinical CT scale. Therefore, in the future, we plan to propose a new evaluation matrix for the evaluation of SR-CycleGAN. 5.Conclusion and Future WorkWe proposed an unsupervised SR method named SR-CycleGAN. We also proposed an innovative MMSR loss to ensure the SR image has similar anatomical structures and similar intensity distribution as the input LR image. Additionally, we improved the network structure to obtain both quantitatively and qualitatively better results. Experimental results demonstrate that our method is suitable for the SR of a lung’s clinical CT to the scale, while conventional CycleGAN (without the proposed loss terms) outputs SR images with low qualitative and quantitative values. Future work includes a more precise quantitative evaluation of our method. In addition, while our method focused on the SR of clinical CT to the scale, it is not limited to the specific SR task of handling clinical CT for the lungs. Our method can also be applied to other SR tasks using medical images as a processing target. Therefore, applying our method to new data will also be among our future works. Since it is often difficult to register images from modalities with different resolutions, we believe that SR methods with training by unpaired LR and HR images will be essential and widely used in the near future. AcknowledgmentsParts of this research were supported by MEXT/JSPS KAKENHI (Grant Nos. 26108006, 17H00867, and 17K20099), the JSPS Bilateral International Collaboration Grants, the Japan Agency for Medical Research and Development (Grant Nos. 18lk1010028s0401 and 19lk1010036h0001), and the Hori Sciences & Arts Foundation. The authors state no conflict of interest and have nothing to disclose. ReferencesH. Rafiemanesh et al.,
“Epidemiology, incidence and mortality of lung cancer and their relationship with the development index in the world,”
J. Thorac. Dis., 8
(6), 1094
(2016). https://doi.org/10.21037/jtd.2016.03.91 Google Scholar
L. A. Torre, R. L. Siegel and A. Jemal, Lung Cancer and Personalized Medicine, 1
–19 Springer, New York
(2016). Google Scholar
H. Sung et al.,
“Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries,”
CA Cancer J. Clin., 71 209
–249
(2021). https://doi.org/10.3322/caac.21660 CAMCAM 0007-9235 Google Scholar
Jr. E. F. Patz, P. C. Goodman and G. Bepler,
“Screening for lung cancer,”
N. Engl. J. Med., 343
(22), 1627
–1633
(2000). https://doi.org/10.1056/NEJM200011303432208 NEJMAG 0028-4793 Google Scholar
A. van Der Wel et al.,
“Increased therapeutic ratio by 18FDG-PET CT planning in patients with clinical CT stage N2-N3M0 non–small-cell lung cancer: a modeling study,”
Int. J. Radiation Oncol. Biol. Phys., 61
(3), 649
–655
(2005). https://doi.org/10.1016/j.ijrobp.2004.06.205 Google Scholar
R. Wender et al.,
“American Cancer Society lung cancer screening guidelines,”
CA Cancer J. Clin., 63
(2), 106
–117
(2013). https://doi.org/10.3322/caac.21172 CAMCAM 0007-9235 Google Scholar
E. Lin and A. Alessio,
“What are the basic concepts of temporal, contrast, and spatial resolution in cardiac CT?,”
J. Cardiovasc. Comput. Tomogr., 3
(6), 403
–408
(2009). https://doi.org/10.1016/j.jcct.2009.07.003 Google Scholar
A. Sombke et al.,
“Potential and limitations of x-ray micro-computed tomography in arthropod neuroanatomy: a methodological and comparative survey,”
J. Comp. Neurol., 523
(8), 1281
–1295
(2015). https://doi.org/10.1002/cne.23741 JCNEAM 0021-9967 Google Scholar
P. Bidola et al.,
“A step towards valid detection and quantification of lung cancer volume in experimental mice with contrast agent-based x-ray microtomography,”
Sci. Rep., 9 1325
(2019). https://doi.org/10.1038/s41598-018-37394-w SRCEC3 2045-2322 Google Scholar
I. J. Fidler, D. M. Gersten and I. R. Hart,
“The biology of cancer invasion and metastasis,”
Adv. Cancer Res., 28 149
–250
(1978). https://doi.org/10.1016/s0065-230x(08)60648-x ACRSAJ 0065-230X Google Scholar
J. S. Isaac and R. Kulkarni,
“Super resolution techniques for medical image processing,”
in Int. Conf. Technol. Sustainable Dev.,
1
–6
(2015). https://doi.org/10.1109/ICTSD.2015.7095900 Google Scholar
M. Irani and S. Peleg,
“Super resolution from image sequences,”
in Proc. 10th Int. Conf. Pattern Recognit.,
115
–120
(1990). https://doi.org/10.1109/ICPR.1990.119340 Google Scholar
D. Shen, G. Wu and H.-I. Suk,
“Deep learning in medical image analysis,”
Annu. Rev. Biomed. Eng., 19 221
–248
(2017). https://doi.org/10.1146/annurev-bioeng-071516-044442 ARBEF7 1523-9829 Google Scholar
O. Ronneberger, P. Fischer and T. Brox,
“U-Net: convolutional networks for biomedical image segmentation,”
Lect. Notes Comput. Sci., 9351 234
–241
(2015). https://doi.org/10.1007/978-3-319-24574-4_28 LNCSD9 0302-9743 Google Scholar
C. You et al.,
“SimCVD: simple contrastive voxel-wise representation distillation for semi-supervised medical image segmentation,”
(2021). Google Scholar
L. Yang et al.,
“Nuset: a deep learning tool for reliably separating and analyzing crowded cells,”
PLoS Comput. Biol., 16
(9), e1008193
(2020). https://doi.org/10.1371/journal.pcbi.1008193 Google Scholar
C. You et al.,
“Unsupervised Wasserstein distance guided domain adaptation for 3D multi-domain liver segmentation,”
Lect. Notes Comput. Sci., 12446 155
–163
(2020). https://doi.org/10.1007/978-3-030-61166-8_17 LNCSD9 0302-9743 Google Scholar
C. You et al.,
“Momentum contrastive voxel-wise representation learning for semi-supervised volumetric medical image segmentation,”
(2021). Google Scholar
C. You et al.,
“Structurally-sensitive multi-scale deep neural network for low-dose ct denoising,”
IEEE Access, 6 41839
–41855
(2018). https://doi.org/10.1109/ACCESS.2018.2858196 Google Scholar
C. You et al.,
“Low-dose CT via deep CNN with skip connection and network-in-network,”
Proc. SPIE, 11113 111131W
(2019). https://doi.org/10.1117/12.2534960 PSISDG 0277-786X Google Scholar
C. Dong et al.,
“Image super-resolution using deep convolutional networks,”
IEEE Trans. Pattern Anal. Mach. Intell., 38
(2), 295
–307
(2016). https://doi.org/10.1109/TPAMI.2015.2439281 ITPIDJ 0162-8828 Google Scholar
C. Ledig et al.,
“Photo-realistic single image super-resolution using a generative adversarial network,”
in Proc. IEEE Conf. Comput. Vision and Pattern Recognit.,
4681
–4690
(2017). https://doi.org/10.1109/CVPR.2017.19 Google Scholar
B. Lim et al.,
“Enhanced deep residual networks for single image super-resolution,”
in Proc. IEEE Conf. Comput. Vision and Pattern Recognit. Workshops,
136
–144
(2017). https://doi.org/10.1109/CVPRW.2017.151 Google Scholar
M. Haris, G. Shakhnarovich and N. Ukita,
“Deep back-projection networks for super-resolution,”
in Proc. IEEE Conf. Comput. Vision and Pattern Recognit.,
1664
–1673
(2018). https://doi.org/10.1109/CVPR.2018.00179 Google Scholar
L. Wang et al.,
“Dual super-resolution learning for semantic segmentation,”
in Proc. IEEE Conf. Comput. Vision and Pattern Recognit.,
3774
–3783
(2020). https://doi.org/10.1109/CVPR42600.2020.00383 Google Scholar
J. Long, E. Shelhamer and T. Darrell,
“Fully convolutional networks for semantic segmentation,”
in Proc. IEEE Conf. Comput. Vision and Pattern Recognit.,
3431
–3440
(2015). https://doi.org/10.1109/CVPR.2015.7298965 Google Scholar
K. He et al.,
“Deep residual learning for image recognition,”
in Proc. IEEE Conf. Comput. Vision and Pattern Recognit.,
770
–778
(2016). https://doi.org/10.1109/CVPR.2016.90 Google Scholar
H. Yu et al.,
“Computed tomography super-resolution using convolutional neural networks,”
in IEEE Int. Conf. Image Process.,
3944
–3948
(2017). https://doi.org/10.1109/ICIP.2017.8297022 Google Scholar
C. You et al.,
“CT super-resolution GAN constrained by the identical, residual, and cycle learning ensemble (GAN-circle),”
IEEE Trans. Med. Imaging, 39
(1), 188
–203
(2020). https://doi.org/10.1109/TMI.2019.2922960 ITMID4 0278-0062 Google Scholar
M.-I. Georgescu, R. T. Ionescu and N. Verga,
“Convolutional neural networks with intermediate loss for 3D super-resolution of CT and MRI scans,”
IEEE Access, 8 49112
–49124
(2020). https://doi.org/10.1109/ACCESS.2020.2980266 Google Scholar
R. Keys,
“Cubic convolution interpolation for digital image processing,”
IEEE Trans. Acoust. Speech Signal Process., 29
(6), 1153
–1160
(1981). https://doi.org/10.1109/TASSP.1981.1163711 IETABA 0096-3518 Google Scholar
Y. Yuan et al.,
“Unsupervised image super-resolution using cycle-in-cycle generative adversarial networks,”
in Proc. IEEE Conf. Comput. Vision and Pattern Recognit. Workshops,
701
–710
(2018). https://doi.org/10.1109/CVPRW.2018.00113 Google Scholar
D. Ravì et al.,
“Adversarial training with cycle consistency for unsupervised super-resolution in endomicroscopy,”
Med. Image Anal., 53 123
–131
(2019). https://doi.org/10.1016/j.media.2019.01.011 Google Scholar
J.-Y. Zhu et al.,
“Unpaired image-to-image translation using cycle-consistent adversarial networks,”
in IEEE Int. Conf. Comput. Vision,
2242
–2251
(2017). https://doi.org/10.1109/ICCV.2017.244 Google Scholar
J.-Y. Zhu et al.,
“Toward multimodal image-to-image translation,”
in Adv. Neural Inf. Process. Syst.,
465
–476
(2017). Google Scholar
W. Shi et al.,
“Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network,”
in Proc. IEEE Conf. Comput. Vision and Pattern Recognit.,
1874
–1883
(2016). https://doi.org/10.1109/CVPR.2016.207 Google Scholar
R. A. Brooks,
“A quantitative theory of the Hounsfield unit and its application to dual energy scanning,”
J. Comput. Assist. Tomogr., 1
(4), 487
–493
(1977). https://doi.org/10.1097/00004728-197710000-00016 JCATD5 0363-8715 Google Scholar
Z. Wang et al.,
“Image quality assessment: from error visibility to structural similarity,”
IEEE Trans. Image Process., 13
(4), 600
–612
(2004). https://doi.org/10.1109/TIP.2003.819861 IIPRE4 1057-7149 Google Scholar
M. Sun et al.,
“Learning pooling for convolutional neural network,”
Neurocomputing, 224 96
–104
(2017). https://doi.org/10.1016/j.neucom.2016.10.049 NRCGEO 0925-2312 Google Scholar
M. Yani et al.,
“Application of transfer learning using convolutional neural network method for early detection of terry’s nail,”
J. Phys. Conf. Ser., 1201
(1), 012052
(2019). https://doi.org/10.1088/1742-6596/1201/1/012052 JPCSDZ 1742-6588 Google Scholar
E. Heitzman, The Lung: Radiologic-Pathologic Correlations, 3rd edMosby Inc., Maryland Heights, Missouri
(1993). Google Scholar
N. R. Pal and S. K. Pal,
“A review on image segmentation techniques,”
Pattern Recognit., 26
(9), 1277
–1294
(1993). https://doi.org/10.1016/0031-3203(93)90135-J Google Scholar
J. Broder, R. Preston,
“Imaging the head and brain,”
Diagnostic Imaging for the Emergency Physician, 1
–45 W.B. Saunders, Saint Louis
(2011). Google Scholar
D. P. Kingma and J. Ba,
“Adam: a method for stochastic optimization,”
(2015). Google Scholar
Z. Wang, J. Chen and S. C. Hoi,
“Deep learning for image super-resolution: a survey,”
IEEE Trans. Pattern Anal. Mach. Intell., 43 3365
–3387
(2021). https://doi.org/10.1109/TPAMI.2020.2982166 ITPIDJ 0162-8828 Google Scholar
A. Hore and D. Ziou,
“Image quality metrics: PSNR vs. SSIM,”
in 20th Int. Conf. Pattern Recognit.,
2366
–2369
(2010). https://doi.org/10.1109/ICPR.2010.579 Google Scholar
T. Zheng et al.,
“Multi-modality super-resolution loss for GAN-based super-resolution of clinical CT images using micro CT image database,”
Proc. SPIE, 11313 1131305
(2020). https://doi.org/10.1117/12.2548929 PSISDG 0277-786X Google Scholar
Z. Wang et al.,
“Image quality assessment: from error visibility to structural similarity,”
IEEE Trans. Image Process., 13 600
–612
(2004). https://doi.org/10.1109/TIP.2003.819861 IIPRE4 1057-7149 Google Scholar
S. Maeda,
“Unpaired image super-resolution using pseudo-supervision,”
in Proc. IEEE/CVF Conf. Comput. Vision and Pattern Recognit.,
291
–300
(2020). https://doi.org/10.1109/CVPR42600.2020.00037 Google Scholar
T. S. Cho et al.,
“Blur kernel estimation using the radon transform,”
in Proc. IEEE/CVF Conf. Comput. Vision and Pattern Recognit.,
241
–248
(2011). https://doi.org/10.1109/CVPR.2011.5995479 Google Scholar
X. Wang et al.,
“ESRGAN: enhanced super-resolution generative adversarial networks,”
Lect. Notes Comput. Sci., 11133 63
–79
(2018). https://doi.org/10.1007/978-3-030-11021-5_5 LNCSD9 0302-9743 Google Scholar
H. Roth et al.,
“Rapid artificial intelligence solutions in a pandemic-the COVID-19-20 lung CT lesion segmentation challenge,”
(2021). Google Scholar
H. Zhao et al.,
“Loss functions for image restoration with neural networks,”
IEEE Trans. Comput. Imaging, 3
(1), 47
–57
(2017). https://doi.org/10.1109/TCI.2016.2644865 Google Scholar
K. Yamashita and K. Markov,
“Medical image enhancement using super resolution methods,”
Lect. Notes Comput. Sci., 12141 496
–508
(2020). https://doi.org/10.1007/978-3-030-50426-7_37 LNCSD9 0302-9743 Google Scholar
W. Xing and K. Egiazarian,
“End-to-end learning for joint image demosaicing, denoising and super-resolution,”
in Proc. IEEE/CVF Conf. Comput. Vision and Pattern Recognit.,
3507
–3516
(2021). https://doi.org/10.1109/CVPR46437.2021.00351 Google Scholar
Q. Lyu et al.,
“Super-resolution MRI and CT through GAN-circle,”
Proc. SPIE, 11113 111130X
(2019). https://doi.org/10.1117/12.2530592 PSISDG 0277-786X Google Scholar
I. Guha et al.,
“Deep learning based high-resolution reconstruction of trabecular bone microstructures from low-resolution CT scans using GAN-circle,”
Proc. SPIE, 11317 113170U
(2020). https://doi.org/10.1117/12.2549318 PSISDG 0277-786X Google Scholar
Q. Lyu et al.,
“Super-resolution MRI through deep learning,”
(2018). Google Scholar
F. Yu et al.,
“Deep layer aggregation,”
in Proc. IEEE Conf. Comput. Vision and Pattern Recognit.,
2403
–2412
(2018). https://doi.org/10.1109/CVPR.2018.00255 Google Scholar
BiographyTong Zheng is a PhD student at Nagoya University. He received his MEng degree from Nagoya University in 2020. He is currently a research fellow at Japan Society for the Promotion of Science (JSPS DC2). His research interests are machine learning, medical imaging, and image processing for chest computed tomography. Hirohisa Oda is a designated assistant professor at Nagoya University. He received his PhD from Nagoya University in 2021. After working in the industry, he started a PhD program at Nagoya University in 2015. His research interests include image processing for microfocus x-ray CT and the computer-aided diagnosis for CT. Masaki Mori is a director in the Department of Respiratory Medicine, Sapporo Kosei-General Hospital. He received his MD and PhD degrees in medicine from Sapporo Medical University in 1979 and 1989, respectively. His research interests are medical image processing and computer-aided diagnosis. Hiroshi Natori is a professor emeritus of Sapporo Medical University, School of Medicine since 2005. He received his MD and PhD degrees in medicine from Sapporo Medical University. His major is in respiratory medicine. His research interest is the analysis of three dimensional architectures and function of the lung. He served as a honorary director of Nishioka Hospital of the Keiwakai Social Medical Corporation, Sapporo. Masahiro Oda is an associate professor at Nagoya University, Japan. He received his PhD from Nagoya University in 2009. His research is in medical image processing and mainly concerns computer-aided diagnosis and computer-assisted surgery in many application areas. He has (co-)authored more than 200 peer-reviewed full papers in international conferences and journals and is the recipient of RSNA Certificate of Merit (2009, 2014, and 2019) awards. Kensaku Mori is a professor at the Graduate School of Information Science, Nagoya University, and a director at the Information Technology Center, Nagoya University. He is a MICCAI fellow. He received his MEng degree in information engineering and PhD in information engineering from Nagoya University in 1994 and 1996, respectively. He was also involved in many international conference organizations, including SPIE Medical Imaging, CARS, and MICCAI, as a general chair or program committee members. |