We propose a framework for learning feature representations for variable-sized regions of interest (ROIs) in breast histopathology images from the convolutional network properties at patch-level. The proposed method involves fine-tuning a pre-trained convolutional neural network (CNN) by using small fixed-sized patches sampled from the ROIs. The CNN is then used to extract a convolutional feature vector for each patch. The softmax probabilities of a patch, also obtained from the CNN, are used as weights that are separately applied to the feature vector of the patch. The final feature representation of a patch is the concatenation of the class-probability weighted convolutional feature vectors. Finally, the feature representation of the ROI is computed by average pooling of the feature representations of its associated patches. The feature representation of the ROI contains local information from the feature representations of its patches while encoding cues from the class distribution of the patch classification outputs. The experiments show the discriminative power of this representation in a 4-class ROI-level classification task on breast histopathology slides where our method achieved an accuracy of 66.8% on a data set containing 437 ROIs with different sizes.
Digitization of full biopsy slides using the whole slide imaging technology has provided new opportunities for understanding the diagnostic process of pathologists and developing more accurate computer aided diagnosis systems. However, the whole slide images also provide two new challenges to image analysis algorithms. The first one is the need for simultaneous localization and classification of malignant areas in these large images, as different parts of the image may have different levels of diagnostic relevance. The second challenge is the uncertainty regarding the correspondence between the particular image areas and the diagnostic labels typically provided by the pathologists at the slide level. In this paper, we exploit a data set that consists of recorded actions of pathologists while they were interpreting whole slide images of breast biopsies to find solutions to these challenges. First, we extract candidate regions of interest (ROI) from the logs of pathologists' image screenings based on different actions corresponding to zoom events, panning motions, and fixations. Then, we model these ROIs using color and texture features. Next, we represent each slide as a bag of instances corresponding to the collection of candidate ROIs and a set of slide-level labels extracted from the forms that the pathologists filled out according to what they saw during their screenings. Finally, we build classifiers using five different multi-instance multi-label learning algorithms, and evaluate their performances under different learning and validation scenarios involving various combinations of data from three expert pathologists. Experiments that compared the slide-level predictions of the classifiers with the reference data showed average precision values up to 62% when the training and validation data came from the same individual pathologist's viewing logs, and an average precision of 64% was obtained when the candidate ROIs and the labels from all pathologists were combined for each slide.
We propose a method for retrieving similar fMRI statistical images given a query fMRI statistical image. Our
method thresholds the voxels within those images and extracts spatially distinct regions from the voxels that
remain. Each region is defined by a feature vector that contains the region centroid, the region area, the average
activation value for all the voxels within that region, the variance of those activation values, the average distance
of each voxel within that region to the region's centroid, and the variance of the voxel's distance to the region's
centroid. The similarity between two images is obtained by the summed minimum distance of their constituent
feature vectors. Results on a dataset of fMRI statistical images from experiments involving distinct cognitive
tasks are shown.
This work introduces a MATLAB-based tool developed for investigating functional connectivity in the brain.
Independent component analysis (ICA) is used as a measure of voxel similarity which allows the user to find and view
statistically independent maps of correlated voxels. These maps of correlated voxel activity may indicate functionally
connected regions. Specialized clustering and feature extraction techniques have been designed to find and characterize
clusters of activated voxels, which allows comparison of the spatial maps of correlation across subjects. This method is
also used to compare the ICA generated images to fMRI images showing statistically significant activations generated by
Statistical Parametric Mapping (SPM). The capability of querying specific coordinates in the brain supports integration
and comparison with other data modalities such as Cortical Stimulation Mapping and Single Unit Recordings.
Recent studies have shown an increase in the occurrence of deformational plagiocephaly and brachycephaly in
children. This increase has coincided with the "Back to Sleep" campaign that was introduced to reduce the risk
of Sudden Infant Death Syndrome (SIDS). However, there has yet to be an objective quantification of the degree
of severity for these two conditions. Most diagnoses are done on subjective factors such as patient history and physician examination. The existence of an objective quantification would help research in areas of diagnosis and intervention measures, as well as provide a tool for finding correlation between the shape severity and cognitive outcome. This paper describes a new shape severity quantification and localization method for deformational plagiocephaly and brachycephaly. Our results show that there is a positive correlation between the new shape severity measure and the scores entered by a human expert.
Content-based image retrieval has been applied to many different biomedical applications1. In almost all cases, retrievals
involve a single query image of a particular modality and retrieved images are from this same modality. For example,
one system may retrieve color images from eye exams, while another retrieves fMRI images of the brain. Yet real
patients often have had tests from multiple different modalities, and retrievals based on more than one modality could
provide information that single modality searches fail to see. In this paper, we show medical image retrieval for two
different single modalities and propose a model for multimodal fusion that will lead to improved capabilities for
physicians and biomedical researchers. We describe a graphical user interface for multimodal retrieval that is being
tested by real biomedical researchers in several different fields.
KEYWORDS: Doppler effect, Ultrasonography, Phase shifts, Blood circulation, Algorithm development, Detection and tracking algorithms, In vivo imaging, In vitro testing, Data acquisition, Magnetic resonance imaging
Color Doppler ultrasound imaging is a powerful non-invasive diagnostic tool for many clinical applications that involve
examining the anatomy and hemodynamics of human blood vessels. These clinical applications include cardio-vascular diseases, obstetrics, and abdominal diseases. Since its commercial introduction in the early eighties, color Doppler ultrasound imaging has been used mainly as a qualitative tool with very little attempts to quantify its images. Many imaging artifacts hinder the quantification of the color Doppler images, the most important of which is the aliasing
artifact that distorts the blood flow velocities measured by the color Doppler technique. In this work we will address the
color Doppler aliasing problem and present a recovery methodology for the true flow velocities from the aliased ones. The problem is formulated as a 2D phase-unwrapping problem, which is a well-defined problem with solid theoretical foundations for other imaging domains, including synthetic aperture radar and magnetic resonance imaging. This paper documents the need for a phase unwrapping algorithm for use in color Doppler ultrasound image analysis. It describes a new phase-unwrapping algorithm that relies on the recently developed cutline detection approaches. The algorithm is novel in its use of heuristic information provided by the ultrasound imaging modality to guide the phase unwrapping process. Experiments have been performed on both in-vitro flow-phantom data and in-vivo human blood flow data. Both data types were acquired under a controlled acquisition protocol developed to minimize the distortion of the color Doppler data and hence to simplify the phase-unwrapping task. In addition to the qualitative assessment of the results, a quantitative assessment approach was developed to measure the success of the results. The results of our new algorithm have been compared on ultrasound data to those from other well-known algorithms, and it outperforms all of them.
Recent studies have shown that more than 5 million bronchoscopy procedures are performed each year worldwide. The
procedure usually involves biopsy of possible cancerous tissues from the lung. Standard bronchoscopes are too large to
reach into the peripheral lung, where cancerous nodules are often found. The University of Washington has developed an
ultrathin and flexible scanning fiber endoscope that is able to advance into the periphery of the human lungs without
sacrificing image quality. To accompany the novel endoscope, we have developed a user interface that serves as a
navigation guide for doctors when performing a bronchoscopy. The navigation system consists of a virtual surface mesh
of the airways extracted from computed-tomography (CT) scan and an electromagnetic tracking system (EMTS). The
complete system can be viewed as a global positioning system for the lung that provides pre-procedural planning
functionalities, virtual bronchoscopy navigation, and real time tracking of the endoscope inside the lung. The real time
virtual navigation is complemented by a particle filter algorithm to compensate for registration errors and outliers, and to
prevent going through surfaces of the virtual lung model. The particle filter method tracks the endoscope tip based on
real time tracking data and attaches the virtual endoscopic view to the skeleton that runs inside the virtual airway surface.
Experiment results on a dried sheep lung show that the particle filter method converges and is able to accurately track the
endoscope tip in real time when the endoscope is inserted both at slow and fast insertion speeds.
Unmanned aerial vehicles with high quality video cameras are able to provide videos from 50,000 feet up that show a surprising amount of detail on the ground. These videos are difficult to analyze, because the airplane moves, the camera zooms in and out and vibrates, and the moving objects of interest can be in the scene, out of the scene, or partly occluded. Recognizing both the moving and static objects is important in order to find events of interest to human analysts. In this paper, we describe our approach to object and event recognition using multiple stages of classification.
In this paper, a method tha combines maximal overlapped discrete wavelet transforms (MODWT) and dynamic time warping (DTW) is presented as a solution for dynamically detecting the hemodynamic response (HR). The MODWT is very effective in extracting only hemodynamic response portions from original signal without any shape distortion. The DTW is desirable for finding various shapes of hemodynamic responses dynamically. The DTW finds the optimal path with minimum cost between the reference signal and the reconstructed input signals by warping the signals in time domain to try to fit the reference. The MODWT-DTW method was evaluated using both simulated and experimental fMRI data. Simulations required identification of 500 synthetically generated hemodynamic responses and 500 randomly generated signals. To access the performance, receiver operating characteristic (ROC) curves were produced. The results indicate better performance for the MODWT-DWT approach compared to the more standard simple correlation methods. Finally, the MODWT-DWT procedure was used to characterize an fMRI data set with good correspondences between solutions derived from statistical parametric mapping techniques.
A new class of algorithms, based on triangle inequality, has recently been proposed for use in content-based image retrieval. These algorithms rely on comparing a set of key images to the database images, and storing the computed distance distances. Query images are later compared to the keys, and the triangle inequality is used to speedily compute lower bounds on the distance from the query to each database image. This paper addresses the question of increasing performance of this algorithm, by the addition of a data structure known as the Triangle Trie.
There is a growing need for the ability to query image databases based on image content rather than strict keyword search. Most current image database systems that perform query by content require a distance computation for each image in the database. Distance computations can be time consuming, limiting the usability of such systems. There is thus a need for indexing systems and algorithms that can eliminate candidate images without performing distance calculations. As user needs may change from session to session, there is also a need for run-time creation of distance measures. In this paper, we introduce FIDS, or `Flexible Image Database System.' FIDS allows the user to query the database based on user-defined polynomial combinations of predefined distance measures. Using an indexing scheme and algorithms based on the triangle inequality, FIDS can return matches to the query image without directly comparing the query images to much of the database. FIDS is currently being tested on a database of eighteen hundred images.
KEYWORDS: Data modeling, Databases, Visual process modeling, 3D modeling, C++, Human-machine interfaces, Image processing, Computing systems, Data communications, Safety
The database environment for vision research (DEVR) is an entity-oriented scientific database system based on a hierarchical relational data model (HRS). This paper describes the design and implementation of the data definition language, the application programmer's interface, and the query mechanism of the DEVR system. DEVR provides a dynamic data definition language for modeling image and vision data, which can be integrated with existing image processing and vision applications. Schema definitions can be fully interleaved with data manipulation, without requiring recompilation. In addition, DEVR provides a powerful application programmer's interface that regulates data access and schema definition, maintains indexes, and enforces type safety and data integrity. The system supports multi-level queries based on recursive constraint trees. A set of HRS entities of a given type is filtered through a network of constraints corresponding to the parts, properties, and relations of that type. Queries can be constructed interactively with a menu-drive interface, or they can be dynamically generated within a vision application using the programmer's interface. Query objects are persistent and reusable. Users may keep libraries of query templates, which can be built incrementally, tested separately, cloned, and linked together to form more complex queries.
A successful visual database system must provide facilities to manage both image data and the products extracted from them. The extracted items usually consist of textual and numeric data from which multiple visualizations can be created. Such visualizations are difficult to automate because they are domain-specific and often require data from multiple sources. In the Database Environment for Vision Research (DEVR) we address these issues. DEVR is an entity- oriented, scientific, visual database system. In DEVR, entities are stored in hierarchical, relational data structures. The schema for each entity contains a name, a set of properties, a set of parts, a set of attributed relations among the parts and a set of graphic definitions which describe how to build instance-specific visualizations. Graphic definitions are composed of one or more graphical primitives. For each primitive, the user identifies required data sources by graphically selecting various properties or parts within the schema hierarchy. As instances are created, the graphic definitions are used to automatically generate visualizations, which can later be viewed via a graphical browser. In this paper, we describe the visualization subsystem of the DEVR system, including schema construction, graphical definition, and instance browsing.
Segmentation of CT images into the various component organs is difficult to perform automatically, because standard methods such as edge tracking, region growing, and simple thresholding do not work. Absolute thresholds are not powerful enough to extract organs, since the gray tones of an organ vary widely depending on the parts, the patient, the CT scanner used, and the setup of the scanner. Edge tracking often fails, because edges around organs are incomplete, and the vagueness of the CT images can mislead most conventional edge detection methods. The nonhomogeneity of organs rules out a region growing approach. Dosimetrists, who trace the boundaries of organs for radiation treatment planning, use their own prior experience with the images and the expected shapes of the organs on various slices to identify organs and their boundaries. The goal of our current work is to develop a knowledge-based recognition system that utilizes knowledge of anatomy and CT imaging. We have developed a system for analyzing CT images of the human abdomen. The system features the use of constraint-based dynamic thresholding, negative-shape constraints to rapidly rule out infeasible segmentation, and progressive landmarking that takes advantage of the different degrees of certainty of successful identification of each organ. The results of a series of initial tests on our training data of 100 images from five patients indicate that the knowledge-based approach is promising.
Robert Haralick, Arun Somani, Craig Wittenbrink, Robert Johnson, Kenneth Cooper, Linda Shapiro, Ihsin Phillips, Jenq Hwang, William Cheung, Yung Yao, Chung-Ho Chen, Larry Yang, Brian Daugherty, Bob Lorbeski, Kent Loving, Tom Miller, Larye Parkins, Steven Soos
KEYWORDS: Image processing, Machine vision, Process control, Telecommunications, Computer vision technology, Signal processing, Control systems, Data processing, Computer architecture, Binary data
The Proteus architecture is a highly parallel MIMD, multiple instruction, multiple-data machine, optimized for large granularity tasks such as machine vision and image processing The system can achieve 20 Giga-flops (80 Giga-flops peak). It accepts data via multiple serial links at a rate of up to 640 megabytes/second. The system employs a hierarchical reconfigurable interconnection network with the highest level being a circuit switched Enhanced Hypercube serial interconnection network for internal data transfers. The system is designed to use 256 to 1,024 RISC processors. The processors use one megabyte external Read/Write Allocating Caches for reduced multiprocessor contention. The system detects, locates, and replaces faulty subsystems using redundant hardware to facilitate fault tolerance. The parallelism is directly controllable through an advanced software system for partitioning, scheduling, and development. System software includes a translator for the INSIGHT language, a parallel debugger, low and high level simulators, and a message passing system for all control needs. Image processing application software includes a variety of point operators neighborhood, operators, convolution, and the mathematical morphology operations of binary and gray scale dilation, erosion, opening, and closing.
The geometric hashing scheme proposed by Lamdan and Wolfson can be very efficient in a model-based matching system, not only in terms of the computational complexity involved, but also in terms of the simplicity of the method. In a recent paper, we discussed errors that can occur with this method due to quantization, stability, symmetry, and noise problems. These errors make the original geometric hashing technique unsuitable for use on the factory floor. Beginning with an explicit noise model, which the original Lamdan and Wolfson technique lacks, we derived an optimal approach that overcomes these problems. We showed that the results obtained with the new algorithm are clearly better than the results from the original method. This paper addresses the performance characterization of the geometric hashing technique, more specifically the affine-invariant point matching, applied to the problem of recognizing and determining the pose of sheet metal parts. The experiments indicate that with a model having 10 to 14 points, with 2 points of the model undetected and 10 extraneous points detected, and with the model points perturbed by Gaussian noise of standard deviation 3 (0.58 of range), the average amount of computation required to obtain an answer is equivalent to trying 11 of the possible three-point bases. The misdetection rate, measured by the percentage of correct bases matches that fail to verify, is 0.9. The percentage of incorrect bases that successfully produced a match that did verify (false alarm rate) is 13. And, finally, 2 of the experiments failed to find a correct match and verify it. Results for experiments with real images are also presented.
The relational pyramid representation is a hierarchical relational description of an object that can be used in recognition and pose estimation. In this representation, primitive features appear at the bottom of the pyramid, relations among primitives appear at level one, and, in general, relations among lower-level structures appear at the levels above the levels in which these structures were defined. We have constructed a pose estimation system that uses the relational pyramid representation for the view classes of a three-dimensional object and for the description of the unknown view of an object extracted from an image. A previously described system uses summary information to rank and select view classes to which the unknown view is compared. This paper describes a best-first search procedure to find correspondences between the selected view class pyramid and the image pyramid.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.