PurposeThis paper presents a deep learning (DL) based method called TextureWGAN. It is designed to preserve image texture while maintaining high pixel fidelity for computed tomography (CT) inverse problems. Over-smoothed images by postprocessing algorithms have been a well-known problem in the medical imaging industry. Therefore, our method tries to solve the over-smoothing problem without compromising pixel fidelity.ApproachThe TextureWGAN extends from Wasserstein GAN (WGAN). The WGAN can create an image that looks like a genuine image. This aspect of the WGAN helps preserve image texture. However, an output image from the WGAN is not correlated to the corresponding ground truth image. To solve this problem, we introduce the multitask regularizer (MTR) to the WGAN framework to make a generated image highly correlated to the corresponding ground truth image so that the TextureWGAN can achieve high-level pixel fidelity. The MTR is capable of using multiple objective functions. In this research, we adopt a mean squared error (MSE) loss to maintain pixel fidelity. We also use a perception loss to improve the look and feel of result images. Furthermore, the regularization parameters in the MTR are trained along with generator network weights to maximize the performance of the TextureWGAN generator.ResultsThe proposed method was evaluated in CT image reconstruction applications in addition to super-resolution and image-denoising applications. We conducted extensive qualitative and quantitative evaluations. We used PSNR and SSIM for pixel fidelity analysis and the first-order and the second-order statistical texture analysis for image texture. The results show that the TextureWGAN is more effective in preserving image texture compared with other well-known methods such as the conventional CNN and nonlocal mean filter (NLM). In addition, we demonstrate that TextureWGAN can achieve competitive pixel fidelity performance compared with CNN and NLM. The CNN with MSE loss can attain high-level pixel fidelity, but it often damages image texture.ConclusionsTextureWGAN can preserve image texture while maintaining pixel fidelity. The MTR is not only helpful to stabilize the TextureWGAN’s generator training but also maximizes the generator performance.
Metal Artifact Reduction (MAR) has been one of the most challenging problems in Computed Tomography (CT) imaging since the technology was invented 50 years ago. Metal implants in patient bodies prevent a CT system from accurately measuring patient anatomies because of the beam hardening effect, X-ray beams’ statistical property changes, and metallic objects’ shapes. Although this problem has been researched for many years, there are still limitations in the conventional model-based methods. Recently, many researchers started using Deep Learning (DL) for the CT MAR problem, which should work better than traditional methods because of its powerful representation capability. In this paper, a new DL-based method is proposed. Our technique is based upon one of the convex optimization methods called primal-dual optimization. A Recurrent Neural Network is adopted to implement the primal-dual optimization for CT MAR. Because of the multi-variable optimization, this neural network conducts dual-domain learning. In addition, the proposed method replaces a metal trace, commonly used in CT MAR methods, with a novel sinogram confidence map. This floating-point map works better than a binary metal trace because of its smoother boundary pixels. Results from extensive experiments indicate that our proposed method outperforms conventional model-based methods and DL-based methods in terms of the Root Mean Squared Error (RMSE), Peak Signal-to-Noise Ratio (PSNR), and Structure Similarity (SSIM) as well as in terms of visual appearance.
Many algorithms and methods have been proposed for inverse problems particularly with the recent surge of interest in machine learning and deep learning methods. Among all proposed methods, the most popular and effective method is the convolutional neural network (CNN) with mean square error (MSE). This method has been proven effective in super-resolution, image de-noising, and image reconstruction. However, this method is known to over-smooth images due to the nature of MSE. MSE based methods minimize Euclidean distance for all pixels between a baseline image and a generated image by CNN and ignore the spatial information of the pixels such as image texture. In this paper, we proposed a new method based on Wasserstein GAN (WGAN) for inverse problems. We showed that the WGAN-based method was effective to preserve image texture. It also used a maximum likelihood estimation (MLE) regularizer to preserve pixel fidelity. Maintaining image texture and pixel fidelity is the most important requirement for medical imaging. We used Peak Signal to Noise Ratio (PSNR) and Structure Similarity (SSIM) to evaluate the proposed method quantitatively. We also conducted the first-order and the second-order statistical image texture analysis to assess image texture.
Filtered back projection (FBP) reconstruction is simple and computationally efficient and is used in many commercial CT (tomography) imaging products. However, higher Poisson noise levels or metal objects in the imaged area can lead to severe artifacts. Iterative reconstruction employs stochastic models for the imaging process and the characteristics of the medical images and can reduce Poisson noise and metal related artifacts. But it is computation-intensive and furthermore, its image models are relatively simple and cannot quite capture the highly complex nature of the medical images, leaving rooms for further improvement. Recent advances in neural networks and deep learning could offer potential solutions to overcome these two problems. Towards that end, most of the neural networks proposed so far for CT image reconstruction are feed-forward networks with CNN (convolutional neural network) and fully connected layers, attempting to learn the mapping from the projections or the FBP output to the reconstructed image. While these networks have demonstrated some promising reconstruction or post-processing results, their architectures are somewhat arbitrary and the question remains as to what would be a more principled way to find a good architecture, thereby further improving reconstruction results. One promising idea is to design the network structure based on signal processing principles such as MAP (maximum a posteriori) estimation and iterative optimization. In this work, we developed a novel RNN (recurrent neural network) based on an accelerated iterative MAP estimation algorithm. This network makes use of, rather than learn, the forward image model such that the learning can be focused on the image or prior model and acceleration. This has led to good reconstruction results where Poisson noise and metal artifacts are greatly reduced.
Stochastic or model-based iterative reconstruction is able to account for the stochastic nature of the CT imaging process and some artifacts and is able to provide better reconstruction quality. It is also, however, computationally expensive. In this work, we investigated the use of some of the neural network training algorithms such as momentum and Adam for iterative CT image reconstruction. Our experimental results indicate that these algorithms provide better results and faster convergence than basic gradient descent. They also provide competitive results to coordinate descent (a leading technique for iterative reconstruction) but, unlike coordinate descent, they can be implemented as parallel computations, hence can potentially accelerate iterative reconstruction in practice.
The detector panel on a typical CT machine today is made of more than 500 detector boards, nicknamed chiclets. Each chiclet contains a number of detectors (i.e., pixels). In the manufacturing process, the chiclets on the panel need to go through an iterative test, swap, and test (TST) process, till some image quality level is achieved. Currently, this process is largely manual and can take hours to several days to complete. This is inefficient and the results can also be inconsistent. In this work, we investigate techniques that can be used to automate the iterative TST process. Specifically, we develop novel prediction techniques that can be used to simulate the iterative TST process. Our results indicate that deep neural networks produce significantly better results than linear regression in the more difficult prediction scenarios.
In this work, we used nonlocal priors in a Bayesian approach for X-ray scatter correction. The control parameters of our algorithms such as the patch sizes and search areas were set in such a way that significant improvement in correction results can be achieved. This, however, led to drastic increases in computation time. To solve this problem, we developed a novel multi-grid technique based on some observations on the matching process involved in the nonlocal priors. Experimental results have demonstrated that this technique is effective, accelerating the computation time significantly while maintaining the quality of correction results. In addition to scatter correction, it can also be used for other image processing applications where fast high-dimensional nonlocal filtering is needed.
X-ray machines are widely used for medical imaging and their cost is highly dependent on their image resolution.
Due to economic reasons, lower-resolution (lower-res) machines still have a lot of customers, especially in developing
economies. Software based resolution enhancement can potentially enhance the capabilities of the lower-res
machines without significantly increasing their cost hence, is highly desirable. In this work, we developed an
algorithm for X-ray image resolution enhancement. In this algorithm, the fractal idea and cross-resolution patch
matching are used to identify low-res patches that can be used as samples for high-res patch/pixel estimation.
These samples are then used to generate a prior distribution and used in a Bayesian MAP (maximum a posteriori)
optimization to produce the high-res image estimate. The efficacy of our algorithm is demonstrated by
experimental results.
In X-ray imaging, scatter can produce noise, artifacts, and decreased contrast. In practice, hardware such as anti-scatter grids are often used to reduce scatter. However, the remaining scatter can still be significant and additional software-based correction are desirable. Furthermore, good software solutions can potentially reduce the amount of needed anti-scatter hardware, thereby reducing cost. In this work, we developed a software correction algorithm by adapting a class of non-local image restoration techniques to scatter reduction. In this algorithm, scatter correction is formulated as a Bayesian MAP (maximum a posteriori) problem with a non-local prior, which leads to better textural detail preservation in scatter reduction. The efficacy of our algorithm is demonstrated through experimental and simulation results.
Compressed sensing can recover a signal that is sparse in some way from a small number of samples. For computed tomography (CT) imaging, this has the potential to obtain good reconstruction from a smaller number of projections or views, thereby reducing the amount of radiation that a patient is exposed to In this work, we applied compressed sensing to fan beam CT image reconstruction, which is a special case of an important 3-D CT problem (cone beam CT). We compared the performance of two compressed sensing algorithms, denoted as the LP and the QP, in simulation. Our results indicate that the LP generally provides smaller reconstruction error and converges faster; therefore, it is preferable.
In this paper, we describe a prediction based compressed-sensing approach for multi-slice
(same time, different locations) or multi-frame (same location, different time) CT image reconstruction.
In this approach, the second slice/frame of a pair of consecutive slices/frames is
reconstructed through reconstructing the prediction error image between the first and second
slice/frame, using compressed-sensing. This approach exploits the inter-slice/inter-frame correlation
and the higher degree of sparsity of the prediction error image to achieve more efficient
image reconstruction, i.e., fewer projections for the same image quality or higher image quality
for the same number of projections. The efficacy of our approach is demonstrated through
simulation results.
Compressed sensing can recover a signal that is sparse in some way from a small number of
samples. For CT imaging, this has the potential to obtain good reconstruction from a smaller
number of projections or views, thereby reducing the amount of patient radiation. In this
work, we applied compressed sensing to fan beam CT image reconstruction , which is a special
case of an important 3D CT problem (cone beam CT). We compared the performance of
two compressed sensing algorithms, denoted as the LP and the QP, in simulation. Our results
indicate that the LP generally provides smaller reconstruction error and converges faster, hence
is more preferable.
In this paper, we describe a novel technique for vision based UAV (unmanned aerial vehicle) navigation. In this technique, the navigation (position estimation) problem is formulated as a tracking problem and solved by a particle filter. The state and observation models of the particle filter are established based on a stereo analysis of the image sequence generated by the UAV's video camera in connection with a DEM (digital elevation map) of the area of the flight, which helps to control estimation error accumulation. The efficacy of this technique is demonstrated by simulation experimental results.
KEYWORDS: Image segmentation, Statistical analysis, Tissues, Blood circulation, Medical imaging, Ultrasonography, Signal processing, Current controlled current source, Principal component analysis, Color imaging
In color flow imaging (CFI), blood flow is estimated from a sequence of 8 to 32 temporal samples using Doppler effects. Most existing techniques are local in that the flow at a spatial location is estimated using its own and, possibly, neighboring temporal samples. As a result, the estimates can often be severely affected by background clutter (e.g., tissue and tissue motion), leading to fragmented/spotty color flow. In this work, we developed a more global technique that segments the temporal samples into connected and smooth spatial regions of blood flow and tissue, thereby improving flow visualization and potentially, flow estimates.
KEYWORDS: Tumors, Motion models, Data modeling, Medical imaging, Imaging informatics, Current controlled current source, Radiotherapy, Cancer, Oncology, Abdomen
Radiation therapy (RT) is an important procedure in the treatment of cancer in the thorax and abdomen. However, its efficacy can be severely limited by breathing induced tumor motion. Tumor motion causes uncertainty in the tumor's location and consequently limits the radiation dosage (for fear of damaging normal tissue). This paper describes a novel signal model for tumor motion tracking/prediction that can potentially improve RT results. Using CT and breathing sensor data, it provides a more accurate characterization of the breathing and tumor motion than previous work and is non-invasive. The efficacy of our model is demonstrated on patient data.
In the past few years, practical distributed video coding systems have been proposed based on Slepian-Wolf and
Wyner-Ziv theorems. The quality of side information plays a critical role in the overall performance for such a system.
In this paper, we present a novel approach to generating the side information using optimal filtering techniques. The
motion vectors (MVs) that define the motion activity between the main information and the side information are first
predicted by an optimal filter, then the MVs obtained from a decoded WZ frame by a conventional motion search
method corrects the prediction results. The side information is generated from the updated MVs via a motion
compensated interpolation (MCI) process and can be subsequently fed into the decoding process to further improve the
quality of a decoded WZ frame. We studied several variations of optimal filters and compared them with other DVC
systems in terms of rate-distortion performance.
KEYWORDS: Cameras, Error analysis, 3D image processing, Motion estimation, Unmanned aerial vehicles, 3D image reconstruction, Geographic information systems, Video, Global Positioning System, 3D vision
In this paper, we describe a novel approach to vision based navigation. In this approach, an airplane's position at each sampling time is estimated through a two-step process. In the first step, the plane's 3D motion is estimated from the current and previous image frames to produce an initial estimate of the plane's position. In the second step, the error in the initial position estimate is corrected by using a test image generated from a digital elevation map of the flight area and the previous frame. Experimental results demonstrated the efficacy of this approach--the correction step reduces position estimation error and with it, the error does not increase with time.
In this paper, we describe a shape space based approach for invariant object representation and recognition. In this approach, an object and all its similarity transformed versions are identified with a single point in a high-dimensional manifold called the shape space. Object recognition is achieved by measuring the geodesic distance between an observed object and a model in the shape space. This approach produced promising results in 2D object recognition experiments: it is invariant to similarity transformations and is relatively insensitive to noise and occlusion. Potentially, it can also be used for 3D object recognition.
KEYWORDS: Image segmentation, Image processing algorithms and systems, Surveillance, Digital filtering, Video compression, Video surveillance, Motion estimation, Cameras, Video, Light sources and illumination
In this paper, we describe a novel approach to image sequence segmentation. In this approach, the presence of moving objects is first detected through background subtraction, i.e., the difference between the current frame and a dynamically updated background. Then, moving objects are extracted from the background subtraction image. Experimental results on surveillance image sequences demonstrated the efficacy of the proposed approach and its improvements over previous background subtraction techniques.
In this paper, we describe a novel approach to image sequence segmentation and its real-time implementation. This approach uses the 3D structure tensor to produce a more robust frame difference signal and uses curve evolution to extract whole objects. Our algorithm is implemented on a standard PC running the Windows operating system with video capture from a USB camera that is a standard Windows video capture device. Using the Windows standard video I/O functionalities, our segmentation software is highly portable and easy to maintain and upgrade. In its current implementation on a Pentium 400, the system can perform segmentation at 5 frames/sec with a frame resolution of 160 by 120.
Prediction is an essential operation in many image processing applications, such as object detection and image and video compression. When the image is modeled as Gaussian, the optimal predictor is linear and easy to obtain. However, image texture and clutter are often non-Gaussian and in such cases, optimal predictors are difficult to find. In this paper, we have derived an optimal predictor for an important class of non-Gaussian image models, the block-based multivariate Gaussian mixture model. This predictor has a special non- linear structure: it is a linear combination of the neighboring pixels but the combination coefficients are also functions of the neighboring pixels, not constants. The efficacy of this predictor is demonstrated in object recognition experiments where the prediction error image is used to identify 'hidden' objects. Results indicate that when the background texture is non-linear, i.e., with fast- switching gray-level patches, it performs significantly better than the optimal linear predictor.
KEYWORDS: Video, Internet, Java, Image processing, Multimedia, Video compression, Video processing, Integration, Digital signal processing, Visualization
In this paper, we present JIP - Java Image Processing on the Internet, a new Internet based application for remote education and software presentation. JIP offers an integrate learning environment on the Internet where remote users not only can share static HTML documents and lectures notes, but also can run and reuse dynamic distributed software components, without having the source code or any extra work of software compilation, installation and configuration. By implementing a platform-independent distributed computational model, local computational resources are consumed instead of the resources on a central server. As an extended Java applet, JIP allows users to selected local image files on their computers or specify any image on the Internet using an URL as input. Multimedia lectures such as streaming video/audio and digital images are integrated into JIP and intelligently associated with specific image processing functions. Watching demonstrations an practicing the functions with user-selected input data dramatically encourages leaning interest, while promoting the understanding of image processing theory. The JIP framework can be easily applied to other subjects in education or software presentation, such as digital signal processing, business, mathematics, physics, or other areas such as employee training and charged software consumption.
In this paper, we describe a wavelet-based approach to multiresolution stochastic image modeling. The basic idea here is that a complex random field, e.g., one with long range and nonlinear spatial correlations, can be decomposed into several less complex random fields. This is done by defining a random field in each resolution level of a wavelet expansion. Texture synthesis experiments, performed by using wavelet autoregressive and radial basis function (RBF) models, have produced promising results. Both models are relatively simple in each resolution and are better than single resolution models in capturing long range correlations. In texture synthesis experiments, the RBF models, especially the non-causal model, provide good visual resemblance to the original for relatively complex textures.
KEYWORDS: Image segmentation, Magnetorheological finishing, Image filtering, Probability theory, Human vision and color perception, Image compression, Eye models, Digital filtering, Electrical engineering, Computer science
This paper describes a Markov random field (MRF) approach to image segmentation. Unlike most previous MRF techniques, which are based on pixel-classification, this approach groups pixels that are similar. This removes the need to know the number of image classes. Mean field theory and multigrid processing are used in the subsequent optimization to find a good segmentation and to alleviate local minimum problems. Variations of the MRF approach are investigated by incorporating features/schemes motivated by characteristics of the human vision system (HVS). Preliminary results are promising and indicate that multi-grid and HVS based features/schemes can significantly improve segmentation results.
KEYWORDS: Wavelets, Reconstruction algorithms, Wavelet transforms, Algorithm development, Optical tomography, Sensors, Absorption, Signal to noise ratio, Tissue optics, Chemical elements
In this paper, we present a wavelet based multiresolution total least squares (TLS) approach to solve the perturbation equation encountered in medical optical tomography. With this scheme, the unknown image, the data, as well as the weight matrix are all represented by wavelet expansions, and thus yielding a multi-resolution representation of the original Rayleigh quotient function in the wavelet domain. This transformed Rayleigh quotient function is then minimized using a multigrid scheme, by which an increasing portion of wavelet coefficients of the unknown image are solved in successive approximations. One can also quickly identify regions of interest (ROI) from a coarse level reconstruction and restrict the reconstruction in the following fine resolutions to those regions. At each resolution level, a TLS solution is obtained iteratively using a conjugate gradient (CG) method. Compared to a previously reported one grid iterative TLS algorithm, the multigrid method requires substantially shorter computation time under the same reconstruction quality criterion.
Recently, there has been growing interest in the use of mean field theory (MFT) in Markov random field (MRF) model-based estimation problems. In many image processing and computer vision applications, the MFT approach can provide comparable performance to that of the simulated annealing, but requires much less computational effort and has easy hardware implementation. The Gibbs-Bogoliubov-Feynman inequality from statistical mechanics provides a general, systematic, and optimal approach for deriving mean field approximations. In this paper, this approach is applied to two important classes of MRF's, the compound Gauss-Markov model and the vector Ising model. The results obtained are compared and related to other methods of deriving mean field equations. Experimental results are also provided to demonstrate the efficacy of this approach.
In this paper, a wavelet based approach to the detection of anomalies in an image is described. In this approach, the anomalies are detected through hypothesis tests on the wavelet coefficients of the input image. In the development of this approach, some results on the correlation structure of the wavelet expansion of wide-sense stationary (WSS) processes are established. Namely, the wavelet coefficients are WSS and weakly within correlated a resolution level, uncorrelated when separated by more than one resolution levels, almost uncorrelated when separated by one resolution level. Experimental results on both synthetic and real-world images (sandpaper defect detection) and comparison with results obtained by neural network demonstrate the efficacy of the wavelet approach.
Perceptual grouping, or perceptual organization, is the process of grouping local image features, such as line segments and regions, into groups (`perceptual chunks') that are likely to have come from the same object. Such an operation is essential to reliable and robust object recognition since the local features are often fragmented and cannot be matched directly to object models. In this paper, a novel approach to the problem of perceptual grouping is described. In this approach, perceptual grouping is accomplished in two steps. First, connections between local features are established or rejected according to which such connections would lead to good global groupings. This is done by performing a tree search of all the possible global groups associated with each potential connection. Second, perceptual groups are generated by propagating local connections and by local competition. The efficacy of this approach is demonstrated on the grouping of line segments in synthetic and real-world images.
A stationary band-limited process is used to construct a wavelet basis. This basis is modified to obtain a biorthogonal sequence which in turn is used to obtain a series representation of the process with uncorrelated coefficients.
Image understanding is a broad field of image processing where the goal is to classify the elements of a scene. In this paper we describe an approach to image understanding based on the matching of structure graphs. The structure graph of the input image is composed of `nodes' (primitives extracted from the image, e.g., regions, line segments) and `edges' (relationships between primitives in the image). The goal of our algorithm is to find the best match between this graph and a prototype graph, representing the knowledge about the expected scene. We formulate the graph matching problem as a consistent labeling problem, where the nodes of the prototype graph are considered labels. We then search for a labeling of the input structure graph that is optimal in the sense that the nodes and edges of the input graph are consistent with the labels and relationships represented in the prototype graph. A `quality of fit' measurement is derived for the matching, and a genetic algorithm is used to find the optimal solution. The advantages of this method of inexact (or fuzzy) matching include its graceful degradation (robustness) in the presence of noise and image deformation, its parallelism, and its adaptability to a variety of domains. We complete this work with the discussion of experimental results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.