In this work, we investigated the feasibility of extracting continuous respiratory parameters from a single RGB camera stationed in a short-stay ward. Based on the extracted respiration parameters, we further investigated the feasibility of using respiratory features to aid in the detection of atrial fibrillation (AF). To extract respiration, we implemented two algorithms: chest optical flow (COF) and energy variance maximization (EVM). We used COF to extract respiration from the patient’s thoracic area and EVM from the patient’s facial area. Using capnography as the reference, for average breath-to-breath rate estimation (i.e., 15-second sliding windows with 50% overlap), we achieved errors within 3 breaths per minute with COF and within 3.5 breaths per minute with EVM. To detect the presence of AF in the respiratory signal, we extracted three respiratory features from the derived COF measurements. We fed these features to a logistic regression model and achieved an average AUC value of 0.64. This result showcases the potential of using camera-based respiratory parameters as predictors for AF, or as surrogate predictors when there is no sufficient facial area in the camera’s field of view for the extraction of cardiac measurements.
Remote Photoplethysmography (remote PPG) enables contactless monitoring of the cardiac rhythm using video cameras. Prior research has shown the feasibility of video-based atrial fibrillation (AF) and/or flutter (Aflutter) detection in some scenarios, but most exclude patient movement. In this work, we investigate the feasibility of detecting these two cardiac arrhythmias in a regular hospital environment using an RGB camera, where patients were not limited in movement during the recording process. Data of 56 patients was collected before and after a scheduled cardioversion treatment. Using the data and machine learning models, we developed three models: First, a model to detect only AF from the data excluding any Aflutter cases. Here we report a sensitivity of 94.5% and a specificity of 89.3% with an AUC of 0.966. Second, a model to classify if a cardiac arrhythmia (AF or Aflutter) is present or not. There we report there a sensitivity of 95.6% and a specificity of 91.2% with an AUC of 0.975. Finally, we develop a multi rhythm model, where we classify the data in AF, Aflutter and sinus rhythm separately. The performance of arrhythmia detection is close to the second model, but we note that the distinction between AF and Aflutter is still a challenge. Here we theorize that remote PPG is more sensitive to noise during Aflutter, which will lead to features in Aflutter which are closer to those of AF. To confirm this, we will extensively review the reason of misclassification of Aflutter as AF in future work.
Surgery is a crucial treatment for malignant brain tumors where gross total resection improves the prognosis. Tissue samples taken during surgery are either subject to a preliminary intraoperative histological analysis, or sent for a full pathological evaluation which can take days or weeks. Whereas a lengthy complete pathological analysis includes an array of techniques to be executed, a preliminary tissue analysis on frozen tissue is performed as quickly as possible (30-45 minutes on average) to provide fast feedback to the surgeon during the surgery. The surgeon uses the information to confirm that the resected tissue is indeed tumor and may, at least in theory, initiate repeated biopsies to help achieve gross total resection. However, due to the total turn-around time of the tissue inspection for repeated analyses, this approach may not be feasible during a single surgery. In this context, intraoperative image-guided techniques can improve the clinical workflow for tumor resection and improve outcome by aiding in the identification and removal of the malignant lesion. Hyperspectral imaging (HSI) is an optical imaging technique with the potential to extract combined spectral-spatial information. By exploiting HSI for human brain-tissue classification in 13 in-vivo hyperspectral images from 9 patients, a brain-tissue classifier is developed. The framework consists of a hybrid 3D-2D CNN-based approach and a band-selection step to enhance the capability of extracting both spectral and spatial information from the hyperspectral images. An overall accuracy of 77% was found when tumor, normal and hyper-vascularized tissue are classified, which clearly outperforms the state-of-the-art approaches (SVM, 2D-CNN). These results may open an attractive future perspective for intraoperative brain-tumor classification using HSI.
Routine surveillance endoscopies are currently used to detect dysplasia in patient with Barrett's Esophagus (BE). However, most of these procedures are performed by non-expert endoscopists in community hospitals. Leading to many missed dysplastic lesions, which can progress into advanced esophageal adenocarcinoma if left untreated.1 In recent years, several successful algorithms have been proposed for the detection of cancer in BE using high-quality overview images. This work addresses the first steps towards clinical application on endoscopic surveillance videos. Several challenges are identified that occur when moving from image-based to video-based analysis. (1) It is shown that algorithms trained on high-quality overview images do not naively transfer to endoscopic videos due to e.g. non-informative frames. (2) Video quality is shown to be an important factor in algorithm performance. Specifically, temporal location performance is highly correlated with video quality. (3) When moving to real-time algorithms, the additional compute necessary to address the challenges in videos will become a burden on the computational budget. However, in addition to challenges, videos also bring new opportunities not available in the current image-based methods such as the inclusion of temporal information. This work shows that a multi-frame approach increases performance compared to a naive single-image method when the above challenges are addressed.
The availability of massive amounts of data in histopathological whole-slide images (WSIs) has enabled the application of deep learning models and especially convolutional neural networks (CNNs), which have shown a high potential for improvement in cancer diagnosis. However, storage and transmission of large amounts of data such as gigapixel histopathological WSIs are challenging. Exploiting lossy compression algorithms for medical images is controversial but, as long as the clinical diagnosis is not affected, is acceptable. We study the impact of JPEG 2000 compression on our proposed CNN-based algorithm, which has produced performance comparable to that of pathologists and which was ranked second place in the CAMELYON17 challenge. Detecting tumor metastases in hematoxylin and eosin-stained tissue sections of breast lymph nodes is evaluated and compared with the pathologists’ diagnoses in three different experimental setups. Our experiments show that the CNN model is robust against compression ratios up to 24:1 when it is trained on uncompressed high-quality images. We demonstrate that a model trained on lower quality images—i.e., lossy compressed images—depicts a classification performance that is significantly improved for the corresponding compression ratio. Moreover, it is also observed that the model performs equally well on all higher-quality images. These properties will help to design cloud-based computer-aided diagnosis (CAD) systems, e.g., telemedicine that employ deep CNN models that are more robust to image quality variations due to compression required to address data storage and transmission constraints. However, the results presented are specific to the CAD system and application described, and further work is needed to examine whether they generalize to other systems and applications.
Imaging mass spectrometry (IMS) is a novel molecular imaging technique to investigate how molecules are distributed between tumors and within tumor region in order to shed light into tumor biology or find potential biomarkers. Convolutional neural networks (CNNs) have proven to be very potent classifiers often outperforming other machine learning algorithms, especially in computational pathology. To overcome the challenge of complexity and high-dimensionality of the IMS data, the proposed CNNs are either very deep or use large kernels, which results in large amount of parameters and therefore a high computational complexity. An alternative is down-sampling the data, which inherently leads to a loss of information. In this paper, we propose using dilated CNNs as a possible solution to this challenge, since it allows for an increase of the receptive field size, neither by increasing the network parameters nor by decreasing the input signal resolution. Since the mass signature of cancer biomarkers are distributed over the whole mass spectrum, both locally- and globally-distributed patterns need to be captured to correctly classify the spectrum. By experiment, we show that employing dilated convolutions in the architecture of a CNN leads to a higher performance in tumor classification. Our proposed model outperforms the state-of-the-art for tumor classification in both clinical lung and bladder datasets by 1-3%.
Vestibular schwannomas are benign brain tumors that can be treated radiosurgically with the Gamma Knife in order to stop tumor progression. However, in some cases tumor progression is not stopped and treatment is deemed a failure. At present, the reason for these failed treatments is unknown. Clinical factors and MRI characteristics have been considered as prognostic factors. Another confounder in the success of treatment is the treatment planning itself. It is thought to be very uniformly planned, even though dose distributions among treatment plans are highly inhomogeneous. This paper explores the predictive value of these dose distributions for the treatment outcome. We compute homogeneity indices (HI) and three-dimensional histogram-of-oriented gradients (3D-HOG) and employ support vector machine (SVM) paired with principal component analysis (PCA) for classification. In a clinical dataset, consisting of 20 tumors that showed treatment failure and 20 tumors showing treatment success, we discover that the correlation of the HI values with the treatment outcome presents no statistical evidence of an association (52:5% accuracy employing linear SVM and no statistical significant difference with t-tests), whereas the 3D-HOG features concerning the dose distribution do present correlations to the treatment outcome, suggesting the influence of the treatment on the outcome itself (77:5% accuracy employing linear SVM and PCA). These findings can provide a basis for refining towards personalized treatments and prediction of treatment efficiency. However, larger datasets are needed for more extensive analysis.
Head and neck cancer (HNC) includes cancers in the oral/nasal cavity, pharynx, larynx, etc., and it is the sixth most common cancer worldwide. The principal treatment is surgical removal where a complete tumor resection is crucial to reduce the recurrence and mortality rate. Intraoperative tumor imaging enables surgeons to objectively visualize the malignant lesion to maximize the tumor removal with healthy safe margins. Hyperspectral imaging (HSI) is an emerging imaging modality for cancer detection, which can augment surgical tumor inspection, currently limited to subjective visual inspection. In this paper, we aim to investigate HSI for automated cancer detection during image-guided surgery, because it can provide quantitative information about light interaction with biological tissues and exploit the potential for malignant tissue discrimination. The proposed solution forms a novel framework for automated tongue-cancer detection, explicitly exploiting HSI, which particularly uses the spectral variations in specific bands describing the cancerous tissue properties. The method follows a machine-learning based classification, employing linear support vector machine (SVM), and offers a superior sensitivity and a significant decrease in computation time. The model evaluation is on 7 ex-vivo specimens of squamous cell carcinoma of the tongue, with known histology. The HSI combined with the proposed classification reaches a sensitivity of 94%, specificity of 68% and area under the curve (AUC) of 92%. This feasibility study paves the way for introducing HSI as a non-invasive imaging aid for cancer detection and increase of the effectiveness of surgical oncology.
3D ultrasound (US) transducers will improve the quality of image-guided medical interventions if an automated detection of the needle becomes possible. Image-based detection of the needle is challenging due to the presence of other echogenic structures in the acquired data, inconsistent visibility of needle parts and the low quality in US imaging. As the currently applied approaches for needle detection classify each voxel individually, they do not consider the global relations between the voxels. In this work, we introduce coherent needle labeling by using dense conditional random fields over a volume, along with 3D space-frequency features. The proposal includes long-distance dependencies in voxel pairs according to their similarities in the feature space and their spatial distance. This post-processing stage leads to better label assignment of volume voxels and a more compact and coherent segmented region. Our ex-vivo experiments based on measuring the F-1, F-2 and IoU scores show that the performance improves a significant 10-20 % compared with only using the linear SVM as a baseline for voxel classification.
Advanced image analysis can lead to automated examination to histopatholgy images which is essential for ob- jective and fast cancer diagnosis. Recently deep learning methods, in particular Convolutional Neural Networks (CNNs), have shown exceptionally successful performance on medical image analysis as well as computational histopathology. Because Whole-Slide Images (WSIs) have a very large size, the CNN models are commonly applied to classify WSIs per patch. Although a CNN is trained on a large part of the input space, the spatial dependencies between patches are ignored and the inference is performed only on appearance of the individual patches. Therefore, prediction on the neighboring regions can be inconsistent. In this paper, we apply Con- ditional Random Fields (CRFs) over latent spaces of a trained deep CNN in order to jointly assign labels to the patches. In our approach, extracted compact features from intermediate layers of a CNN are considered as observations in a fully-connected CRF model. This leads to performing inference on a wider context rather than appearance of individual patches. Experiments show an improvement of approximately 3.9% on average FROC score for tumorous region detection in histopathology WSIs. Our proposed model, trained on the Camelyon171 ISBI challenge dataset, won the 2nd place with a kappa score of 0.8759 in patient-level pathologic lymph node classification for breast cancer detection.
Vestibular schwannomas (VS) are benign brain tumors that can be treated with high-precision focused radiation with the Gamma Knife in order to stop tumor growth. Outcome prediction of Gamma Knife radiosurgery (GKRS) treatment can help in determining whether GKRS will be effective on an individual patient basis. However, at present, prognostic factors of tumor control after GKRS for VS are largely unknown, and only clinical factors, such as size of the tumor at treatment and pre-treatment growth rate of the tumor, have been considered thus far. This research aims at outcome prediction of GKRS by means of quantitative texture feature analysis on conventional MRI scans. We compute first-order statistics and features based on gray-level co- occurrence (GLCM) and run-length matrices (RLM), and employ support vector machines and decision trees for classification. In a clinical dataset, consisting of 20 tumors showing treatment failure and 20 tumors exhibiting treatment success, we have discovered that the second-order statistical metrics distilled from GLCM and RLM are suitable for describing texture, but are slightly outperformed by simple first-order statistics, like mean, standard deviation and median. The obtained prediction accuracy is about 85%, but a final choice of the best feature can only be made after performing more extensive analyses on larger datasets. In any case, this work provides suitable texture measures for successful prediction of GKRS treatment outcome for VS.
In current clinical practice, the resectability of pancreatic ductal adenocarcinoma (PDA) is determined subjec- tively by a physician, which is an error-prone procedure. In this paper, we present a method for automated determination of resectability of PDA from a routine abdominal CT, to reduce such decision errors. The tumor features are extracted from a group of patients with both hypo- and iso-attenuating tumors, of which 29 were resectable and 21 were not. The tumor contours are supplied by a medical expert. We present an approach that uses intensity, shape, and texture features to determine tumor resectability. The best classification results are obtained with fine Gaussian SVM and the L0 Feature Selection algorithms. Compared to expert predictions made on the same dataset, our method achieves better classification results. We obtain significantly better results on correctly predicting non-resectability (+17%) compared to a expert, which is essential for patient treatment (negative prediction value). Moreover, our predictions of resectability exceed expert predictions by approximately 3% (positive prediction value).
Volumetric Laser Endomicroscopy (VLE) is a promising technique for the detection of early neoplasia in Barrett’s Esophagus (BE). VLE generates hundreds of high resolution, grayscale, cross-sectional images of the esophagus. However, at present, classifying these images is a time consuming and cumbersome effort performed by an expert using a clinical prediction model. This paper explores the feasibility of using computer vision techniques to accurately predict the presence of dysplastic tissue in VLE BE images. Our contribution is threefold. First, a benchmarking is performed for widely applied machine learning techniques and feature extraction methods. Second, three new features based on the clinical detection model are proposed, having superior classification accuracy and speed, compared to earlier work. Third, we evaluate automated parameter tuning by applying simple grid search and feature selection methods. The results are evaluated on a clinically validated dataset of 30 dysplastic and 30 non-dysplastic VLE images. Optimal classification accuracy is obtained by applying a support vector machine and using our modified Haralick features and optimal image cropping, obtaining an area under the receiver operating characteristic of 0.95 compared to the clinical prediction model at 0.81. Optimal execution time is achieved using a proposed mean and median feature, which is extracted at least factor 2.5 faster than alternative features with comparable performance.
Esophageal cancer is one of the fastest rising forms of cancer in the Western world. Using High-Definition (HD) endoscopy, gastroenterology experts can identify esophageal cancer at an early stage. Recent research shows that early cancer can be found using a state-of-the-art computer-aided detection (CADe) system based on analyzing static HD endoscopic images. Our research aims at extending this system by applying Random Forest (RF) classification, which introduces a confidence measure for detected cancer regions. To visualize this data, we propose a novel automated annotation system, employing the unique characteristics of the previous confidence measure. This approach allows reliable modeling of multi-expert knowledge and provides essential data for real-time video processing, to enable future use of the system in a clinical setting. The performance of the CADe system is evaluated on a 39-patient dataset, containing 100 images annotated by 5 expert gastroenterologists. The proposed system reaches a precision of 75% and recall of 90%, thereby improving the state-of-the-art results by 11 and 6 percentage points, respectively.
Over the past decade, the imaging tools for endoscopists have improved drastically. This has enabled physicians to visually inspect the intestinal tissue for early signs of malignant lesions. Besides this, recent studies show the feasibility of supportive image analysis for endoscopists, but the analysis problem is typically approached as a segmentation task where binary ground truth is employed. In this study, we show that the detection of early cancerous tissue in the gastrointestinal tract cannot be approached as a binary segmentation problem and it is crucial and clinically relevant to involve multiple experts for annotating early lesions. By employing the so-called sweet spot for training purposes as a metric, a much better detection performance can be achieved. Furthermore, a multi-expert-based ground truth, i.e. a golden standard, enables an improved validation of the resulting delineations. For this purpose, besides the sweet spot we also propose another novel metric, the Jaccard Golden Standard (JIGS) that can handle multiple ground-truth annotations. Our experiments involving these new metrics and based on the golden standard show that the performance of a detection algorithm of early neoplastic lesions in Barrett's esophagus can be increased significantly, demonstrating a 10 percent point increase in the resulting F1 detection score.
Recently, compressed-sensing based algorithms have enabled volume reconstruction from projection images acquired over a relatively small angle (θ < 20°). These methods enable accurate depth estimation of surgical tools with respect to anatomical structures. However, they are computationally expensive and time consuming, rendering them unattractive for image-guided interventions. We propose an alternative approach for depth estimation of biopsy needles during image-guided interventions, in which we split the problem into two parts and solve them independently: needle-depth estimation and volume reconstruction. The complete proposed system consists of the previous two steps, preceded by needle extraction. First, we detect the biopsy needle in the projection images and remove it by interpolation. Next, we exploit epipolar geometry to find point-to-point correspondences in the projection images to triangulate the 3D position of the needle in the volume. Finally, we use the interpolated projection images to reconstruct the local anatomical structures and indicate the position of the needle within this volume. For validation of the algorithm, we have recorded a full CT scan of a phantom with an inserted biopsy needle. The performance of our approach ranges from a median error of 2.94 mm for an distributed viewing angle of 1° down to an error of 0.30 mm for an angle larger than 10°. Based on the results of this initial phantom study, we conclude that multi-view geometry offers an attractive alternative to time-consuming iterative methods for the depth estimation of surgical tools during C-arm-based image-guided interventions.
Ultrasound imaging is employed for needle guidance in various minimally invasive procedures such as biopsy guidance, regional anesthesia and brachytherapy. Unfortunately, a needle guidance using 2D ultrasound is very challenging, due to a poor needle visibility and a limited field of view. Nowadays, 3D ultrasound systems are available and more widely used. Consequently, with an appropriate 3D image-based needle detection technique, needle guidance and interventions may significantly be improved and simplified. In this paper, we present a multi-resolution Gabor transformation for an automated and reliable extraction of the needle-like structures in a 3D ultrasound volume. We study and identify the best combination of the Gabor wavelet frequencies. High precision in detecting the needle voxels leads to a robust and accurate localization of the needle for the intervention support. Evaluation in several ex-vivo cases shows that the multi-resolution analysis significantly improves the precision of the needle voxel detection from 0.23 to 0.32 at a high recall rate of 0.75 (gain 40%), where a better robustness and confidence were confirmed in the practical experiments.
KEYWORDS: Wavelets, Ultrasonography, 3D modeling, Detection and tracking algorithms, 3D image processing, Transducers, Breast, Visibility, 3D acquisition, Visualization
Ultrasound-guided needle interventions are widely practiced in medical diagnostics and therapy, i.e. for biopsy guidance, regional anesthesia or for brachytherapy. Needle guidance using 2D ultrasound can be very challenging due to the poor needle visibility and the limited field of view. Since 3D ultrasound transducers are becoming more widely used, needle guidance can be improved and simplified with appropriate computer-aided analyses. In this paper, we compare two state-of-the-art 3D needle detection techniques: a technique based on line filtering from literature and a system employing Gabor transformation. Both algorithms utilize supervised classification to pre-select candidate needle voxels in the volume and then fit a model of the needle on the selected voxels. The major differences between the two approaches are in extracting the feature vectors for classification and selecting the criterion for fitting. We evaluate the performance of the two techniques using manually-annotated ground truth in several ex-vivo situations of different complexities, containing three different needle types with various insertion angles. This extensive evaluation provides better understanding on the limitations and advantages of each technique under different acquisition conditions, which is leading to the development of improved techniques for more reliable and accurate localization. Benchmarking results that the Gabor features are better capable of distinguishing the needle voxels in all datasets. Moreover, it is shown that the complete processing chain of the Gabor-based method outperforms the line filtering in accuracy and stability of the detection results.
The growing traffic density in cities fuels the desire for collision assessment systems on public transportation. For this application, video analysis is broadly accepted as a cornerstone. For trams, the localization of tramway tracks is an essential ingredient of such a system, in order to estimate a safety margin for crossing traffic participants. Tramway-track detection is a challenging task due to the urban environment with clutter, sharp curves and occlusions of the track. In this paper, we present a novel and generic system to detect the tramway track in advance of the tram position. The system incorporates an inverse perspective mapping and a-priori geometry knowledge of the rails to find possible track segments. The contribution of this paper involves the creation of a new track reconstruction algorithm which is based on graph theory. To this end, we define track segments as vertices in a graph, in which edges represent feasible connections. This graph is then converted to a max-cost arborescence graph, and the best path is selected according to its location and additional temporal information based on a maximum a-posteriori estimate. The proposed system clearly outperforms a railway-track detector. Furthermore, the system performance is validated on 3,600 manually annotated frames. The obtained results are promising, where straight tracks are found in more than 90% of the images and complete curves are still detected in 35% of the cases.
Prematurely born infants receive special care in the Neonatal Intensive Care Unit (NICU), where various physiological parameters, such as heart rate, oxygen saturation and temperature are continuously monitored. However, there is no system for monitoring and interpreting their facial expressions, the most prominent discomfort indicator. In this paper, we present an experimental video monitoring system for automatic discomfort detection in infants’ faces based on the analysis of their facial expressions. The proposed system uses an Active Appearance Model (AAM) to robustly track both the global motion of the newborn’s face, as well as its inner features. The system detects discomfort by employing the AAM representations of the face on a frame-by-frame basis, using a Support Vector Machine (SVM) classifier. Three contributions increase the performance of the system. First, we extract several histogram-based texture descriptors to improve the AAM appearance representations. Second, we fuse the outputs of various individual SVM classifiers, which are trained on features with complementary qualities. Third, we improve the temporal behavior and stability of the discomfort detection by applying an averaging filter to the classification outputs. Additionally, for a higher robustness, we explore the effect of applying different image pre-processing algorithms for correcting illumination conditions and for image enhancement to evaluate possible detection improvements. The proposed system is evaluated in 15 videos of 8 infants, yielding a 0.98 AUC performance. As a bonus, the system offers monitoring of the infant’s expressions when it is left unattended and it additionally provides objective judgment of discomfort.
This paper proposes an original moving ship detection approach in video surveillance systems, especially con- centrating on occlusion problems among ships and vegetation using context information. Firstly, an over- segmentation is performed to divide and classify by SVM (Support Vector Machine) segments into water or non-water, while exploiting the context that ships move only in water. We assume that the ship motion to be characterized by motion saliency and consistency, such that each ship distinguish itself. Therefore, based on the water context model, non-water segments are merged into regions with motion similarity. Then, moving ships are detected by measuring the motion saliency of those regions. Experiments on real-life surveillance videos prove the accuracy and robustness of the proposed approach. We especially pay attention to testing in the cases of severe occlusions between ships and between ship and vegetation. The proposed algorithm outperforms, in terms of precision and recall, our earlier work and a proposal using SVM-based ship detection.
The use of contextual information can significantly aid scene understanding of surveillance video. Just detecting people and tracking them does not provide sufficient information to detect situations that require operator attention. We propose a proof-of-concept system that uses several sources of contextual information to improve scene understanding in surveillance video. The focus is on two scenarios that represent common video surveillance situations, parking lot surveillance and crowd monitoring. In the first scenario, a pan–tilt–zoom (PTZ) camera tracking system is developed for parking lot surveillance. Context is provided by the traffic sign recognition system to localize regular and handicapped parking spot signs as well as license plates. The PTZ algorithm has the ability to selectively detect and track persons based on scene context. In the second scenario, a group analysis algorithm is introduced to detect groups of people. Contextual information is provided by traffic sign recognition and region labeling algorithms and exploited for behavior understanding. In both scenarios, decision engines are used to interpret and classify the output of the subsystems and if necessary raise operator alerts. We show that using context information enables the automated analysis of complicated scenarios that were previously not possible using conventional moving object classification techniques.
In port surveillance, video-based monitoring is a valuable supplement to a radar system by helping to detect smaller ships in the shadow of a larger ship and with the possibility to detect nonmetal ships. Therefore, automatic video-based ship detection is an important research area for security control in port regions. An approach that automatically detects moving ships in port surveillance videos with robustness for occlusions is presented. In our approach, important elements from the visual, spatial, and temporal features of the scene are used to create a model of the contextual information and perform a motion saliency analysis. We model the context of the scene by first segmenting the video frame and contextually labeling the segments, such as water, vegetation, etc. Then, based on the assumption that each object has its own motion, labeled segments are merged into individual semantic regions even when occlusions occur. The context is finally modeled to help locating the candidate ships by exploring semantic relations between ships and context, spatial adjacency and size constraints of different regions. Additionally, we assume that the ship moves with a significant speed compared to its surroundings. As a result, ships are detected by checking motion saliency for candidate ships according to the predefined criteria. We compare this approach with the conventional technique for object classification based on support vector machine. Experiments are carried out with real-life surveillance videos, where the obtained results outperform two recent algorithms and show the accuracy and robustness of the proposed ship detection approach. The inherent simplicity of our algorithmic subsystems enables real-time operation of our proposal in embedded video surveillance, such as port surveillance systems based on moving, nonstatic cameras.
This paper presents an automatic ship detection approach for video-based port surveillance systems. Our approach combines context and motion saliency analysis. The context is represented by the assumption that ships only travel inside a water region. We perform motion saliency analysis since we expect ships to move with higher speed compared to the water flow and static environment. A robust water detection is first employed to extract the water region as contextual information in the video frame, which is achieved by graph-based segmentation and region-based classification. After the water detection, the segments labeled as non-water are merged to form the regions containing candidate ships, based on the spatial adjacency. Finally, ships are detected by checking motion saliency for each candidate ship according to a set of criteria. Experiments are carried out with real-life surveillance videos, where the obtained results prove the accuracy and robustness of the proposed ship detection approach. The proposed algorithm outperforms a state-of-the-art algorithm when applied to the same sets of surveillance videos.
Esophageal cancer is the fastest rising type of cancer in the Western world. The recent development of High-Definition (HD) endoscopy has enabled the specialist physician to identify cancer at an early stage. Nevertheless, it still requires considerable effort and training to be able to recognize these irregularities associated with early cancer. As a first step towards a Computer-Aided Detection (CAD) system that supports the physician in finding these early stages of cancer, we propose an algorithm that is able to identify irregularities in the esophagus automatically, based on HD endoscopic images. The concept employs tile-based processing, so our system is not only able to identify that an endoscopic image contains early cancer, but it can also locate it. The identification is based on the following steps: (1) preprocessing, (2) feature extraction with dimensionality reduction, (3) classification. We evaluate the detection performance in RGB, HSI and YCbCr color space using the Color Histogram (CH) and Gabor features and we compare with other well-known features to describe texture. For classification, we employ a Support Vector Machine (SVM) and evaluate its performance using different parameters and kernel functions. In experiments, our system achieves a classification accuracy of 95.9% on 50×50 pixel tiles of tumorous and normal tissue and reaches an Area Under the Curve (AUC) of 0.990. In 22 clinical examples our algorithm was able to identify all (pre-)cancerous regions and annotate those regions reasonably well. The experimental and clinical validation are considered promising for a CAD system that supports the physician in finding early stage cancer.
Interactive free-viewpoint selection applied to a 3D multi-view video signal is an attractive feature of the rapidly
developing 3DTV media. In recent years, significant research has been done on free-viewpoint rendering algorithms
which mostly have similar building blocks. In our previous work, we have analyzed the principal
building blocks of most recent rendering algorithms and their contribution to the overall rendering quality. We
have discovered that the first step, Warping determines the basic quality level of the complete rendering chain.
In this paper, we have analyzed the warping step in more detail since it leads to ways for improvement. We have
observed that the accuracy of warping is mainly determined by two factors which are sampling and rounding
errors when performing pixel-based warping and quantization errors of depth maps. For each error factor, we
have proposed a technique that can reduce the errors and thus increase the warping quality. Pixel-based warping
errors are reduced by employing supersampling at the reference and virtual images and we decrease depth map
errors by creating depth maps with more quantization levels. The new techniques are evaluated with two series of
experiments using real-life and synthetic data. From these experiments, we have observed that reducing warping
errors may increases the overall rendering quality and that the impact of errors due to pixel-based warping is
much larger than that of errors due to depth quantization.
KEYWORDS: Video, 3D video streaming, Cameras, Internet, Video compression, Computer programming, 3D video compression, Video coding, Quantization, Receivers
Virtual views in 3D-TV and multi-view video systems are reconstructed images of the scene generated synthetically
from the original views. In this paper, we analyze the performance of streaming virtual views over
IP-networks with a limited and time-varying available bandwidth. We show that the average video quality
perceived by the user can be improved with an adaptive streaming strategy aiming at maximizing the average
video quality. Our adaptive 3D multi-view streaming can provide a quality improvement of 2 dB on the average
- over non-adaptive streaming. We demonstrate that an optimized virtual view adaptation algorithm needs to
be view-dependent and achieve an improvement of up to 0.7 dB. We analyze our adaptation strategies under
dynamic available bandwidth in the network.
KEYWORDS: Cameras, Video, Volume rendering, Digital filtering, Video compression, Optical filters, Image quality, Algorithm development, 3D image processing, 3D modeling
Interactive free-viewpoint selection applied to a 3D multi-view signal is a possible attractive feature of the
rapidly developing 3D TV media. This paper explores a new rendering algorithm that computes a free-viewpoint
based on depth image warping between two reference views from existing cameras. We have developed three
quality enhancing techniques that specifically aim at solving the major artifacts. First, resampling artifacts are
filled in by a combination of median filtering and inverse warping. Second, contour artifacts are processed while
omitting warping of edges at high discontinuities. Third, we employ a depth signal for more accurate disocclusion
inpainting. We obtain an average PSNR gain of 3 dB and 4.5 dB for the 'Breakdancers' and 'Ballet' sequences,
respectively, compared to recently published results. While experimenting with synthetic data, we observe that
the rendering quality is highly dependent on the complexity of the scene. Moreover, experiments are performed
using compressed video from surrounding cameras. The overall system quality is dominated by the rendering
quality and not by coding.
We describe our work on text-image alignment in context of building a historical document retrieval system. We
aim at aligning images of words in handwritten lines with their text transcriptions. The images of handwritten
lines are automatically segmented from the scanned pages of historical documents and then manually transcribed.
To train automatic routines to detect words in an image of handwritten text, we need a training set - images of
words with their transcriptions. We present our results on aligning words from the images of handwritten lines and
their corresponding text transcriptions. Alignment based on the longest spaces between portions of handwriting
is a baseline. We then show that relative lengths, i.e. proportions of words in their lines, can be used to improve
the alignment results considerably. To take into account the relative word length, we define the expressions for
the cost function that has to be minimized for aligning text words with their images. We apply right to left
alignment as well as alignment based on exhaustive search. The quality assessment of these alignments shows
correct results for 69% of words from 100 lines, or 90% of partially correct and correct alignments combined.
In our effort to contribute to the closing of the "semantic gap" between images and their semantic description, we are building a large-scale ontology of images of objects. This visual catalog will contain a large number of images of objects, structured in a hierarchical catalog, allowing image processing researchers to derive signatures for wide classes of objects. We are building this ontology using images found on the web. We describe in this article our initial approach for finding coherent sets of object images. We first perform two semantic filtering steps: the first involves deciding which words correspond to objects and using these words to access databases which index text found associated with an image (e.g. Google Image search) to find a set of candidate images; the second semantic filtering step involves using face recognition technology to remove images of people from the candidate set (we have found that often requests for objects return images of people). After these two steps, we have a cleaner set of candidate images for each object. We then index and cluster the remaining images using our system VIKA (VIsual KAtaloguer) to find coherent sets of objects.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.