The paper addresses the further advance in our complex research in the field of multisensory image fusion based on generative adversarial models [1-2] and their application to such practical tasks as visual representation of fused images, acquired in different spectral ranges (e.g. TV and IR), and changes detection on images, acquired in different conditions (e.g. season-varying images). A developed architecture of a neural network based on pix2pix model is presented, which can solve the both tasks mentioned above. A technique for generating training and test datasets including data augmentation process is described. The results are demonstrated on real-world images.
In this paper we propose a new algorithm for image filtering using morphological thickness map. Compared to the other smoothing methods, such as anisotropic diffusion, comparative filters, guided and rolling guidance filters, the benefit of our method is that it natively works with the image structure – thickness map, so it does not depend on the various levels of image noise, lightning conditions and effects. We present the method idea, algorithm itself and various experimental results. The results of the filtering using our algorithm can be widely applied in such image processing tasks as image segmentation, motion analysis, invariant feature transformation, data compression.
Unseen object detection problem is known as a semantic matching problem. Thus, a semantic matcher takes two images as an input – the request image and the test image. The request image represents an object class needed to be found on the test image. In this paper, we propose a new region proposal based semantic matcher. In our region based semantic matcher we use the same ideas as in R-CNN. Our Body CNN also generates proposals similar to classical Faster R-CNN, and Head-CNN compares proposals with a request descriptor, extracted from the request image. To extract features from the request image we use Request descriptor CNN. All three CNNs – Head, Body and Request descriptor are trained together, end-to-end for seen class object detection by request and then applied to both seen and unseen classes. We have trained and tested our CNN on Pascal VOC Dataset.
More than 80% of video surveillance systems are used for monitoring people. Old human detection algorithms, based on background and foreground modelling, could not even deal with a group of people, to say nothing of a crowd. Recent robust and highly effective pedestrian detection algorithms are a new milestone of video surveillance systems. Based on modern approaches in deep learning, these algorithms produce very discriminative features that can be used for getting robust inference in real visual scenes. They deal with such tasks as distinguishing different persons in a group, overcome problem with sufficient enclosures of human bodies by the foreground, detect various poses of people. In our work we use a new approach which enables to combine detection and classification tasks into one challenge using convolution neural networks. As a start point we choose YOLO CNN, whose authors propose a very efficient way of combining mentioned above tasks by learning a single neural network. This approach showed competitive results with state-of-the-art models such as FAST R-CNN, significantly overcoming them in speed, which allows us to apply it in real time video surveillance and other video monitoring systems. Despite all advantages it suffers from some known drawbacks, related to the fully-connected layers that obstruct applying the CNN to images with different resolution. Also it limits the ability to distinguish small close human figures in groups which is crucial for our tasks since we work with rather low quality images which often include dense small groups of people. In this work we gradually change network architecture to overcome mentioned above problems, train it on a complex pedestrian dataset and finally get the CNN detecting small pedestrians in real scenes.
The paper proposes a semantic segmentation algorithm based on Convolutional Neural Networks (CNN) related to the problem of presenting multispectral sensor-derived images in Enhanced Vision Systems (EVS). The CNN architecture based on residual SqueezeNet with deconvolutional layers is presented. To create an in-domain training dataset for CNN, a semi-automatic scenario with the use of photogrammetric technique is described. Experimental results are shown for problem-oriented images, obtained by TV and IR sensors of the EVS prototype in a set of flight experiments.
Existing image fusion methods based on morphological image analysis, that expresses the geometrical idea of image shape as a label image, are quite sensitive to the quality of image segmentation and, therefore, not sufficiently robust to noise and high frequency distortions. On the other hand, there are a number of methods in the field of dimensionality reduction and data comparison that give possibility of avoiding an image segmentation step by using diffusion maps techniques. The paper proposes a new approach for multispectral image fusion based on the combination of morphological image analysis and diffusion maps theory (i.e. Diffusion Morphology). A new image fusion algorithm is described that uses a matched diffusion filtering procedure instead of morphological projection. The algorithm is implemented for a three channels Enhanced Vision System prototype. The comparative results of image fusion are shown on real images acquired in flight experiments.
The improved stereo-based approach for dynamic road scene understanding in a Driver Assistance System (DAS) is presented. System calibration is addressed. Algorithms for road lane detection, road 3D model generation, obstacle predetection and object (vehicle) detection are described. Lane detection is based on the evidence analysis. Obstacle predetection procedure performs the comparison of radial ortophotos, obtained by left and right stereo images. Object detection algorithm is based on recognition of back part of cars by histograms of oriented gradients. Car Stereo Sequences (CSS) Dataset captured by vehicle-based laboratory and published for DAS algorithms testing.
In this paper we propose a method for classification of moving objects of “human” and “car” types in computer vision
systems using statistical hypotheses and integration of the results using two different decision rules. FAR-FRR graphs
for all criteria and the decision rule are plotted. Confusion matrix for both ways of integration is presented. The example
of the method application to the public video databases is provided. Ways of accuracy improvement are proposed.
Most part of existing systems for face recognition is usually based on two-dimensional images. And the quality of recognition is rather high for frontal images of face. But for other kind of images the quality decreases significantly. It is necessary to compensate for the effect of a change in the posture of a person (the camera angle) for correct operation of such systems. There are methods of transformation of 2D image of the person to the canonical orientation. The efficiency of these methods depends on the accuracy of determination of specific anthropometric points. Problems can arise for cases of partly occlusion of the person`s face. Another approach is to have a set of person images for different view angles for the further processing. But a need for storing and processing a large number of two-dimensional images makes this method considerably time-consuming. The proposed technique uses stereo system for fast generation of person face 3D model and obtaining face image in given orientation using this 3D model. Real-time performance is provided by implementing and graph cut methods for face surface 3D reconstruction and applying CUDA software library for parallel calculation.
The most famous morphological filters are the morphological opening and closing, produced by superposition of morphological dilation and erosion implemented as Minkovsky addition and subtraction with structuring elements. And in practice the modern image processing uses no other morphological filters. However, it is possible to design some other and different morphological filters satisfying the Serra's definition of morphological filter and having some useful and meaningful properties. In the previous work the new 'selective morphology' (SM) was proposed based on 'monotonization' technique. SM allows designing special morphological operators different from operators of classic MM Serra. This paper describes the further results of this approach: It is proved, that the classic and selective morphological filters are corresponding bottom and top bounds for any other morphological filters of some kind. The required and enough conditions are determined those guarantee operators designed by 'monotonization' scheme to be morphological filters in Serra's sense. The new constructive scheme is proposed that makes it possible to design a wide range of alternative morphological filters between classic and selective morphological operators. Some examples of morphological filters design based on selective morphology are given. In particular, the morphological filter based on Hough Transform is described.
Progress in imaging sensors and computers create the background for numerous 3D imaging application for wide variety of manufacturing activity. Many demands for automated precise measurements are in wood branch of industry. One of them is the accurate volume definition for cut trees carried on the truck. The key point for volume estimation is determination of the front area of the cut tree package. To eliminate slow and inaccurate manual measurements being now in practice the experimental system for automated non-contact wood measurements is developed. The system includes two non-metric CCD video cameras, PC as central processing unit, frame grabbers and original software for image processing and 3D measurements. The proposed method of measurement is based on capturing the stereo pair of front of trees package and performing the image orthotranformation into the front plane. This technique allows to process transformed image for circle shapes recognition and calculating their area. The metric characteristics of the system are provided by special camera calibration procedure. The paper presents the developed method of 3D measurements, describes the hardware used for image acquisition and the software realized the developed algorithms, gives the productivity and precision characteristics of the system.
This paper describes a new selective morphology (SM) that provides the morphological operators with selecting-only and filtering-only properties (S-operators and F-operators). In this framework, S-opening and S-closing operators perform the extreme monotonous reconstruction of source image, starting from the results or erosion and dilation correspondingly. F-opening and F-closing are formed as an algebraic combination of S-opening and S-closing and usual morphological opening and closing. It is proved that SM filters have the most mathematical properties of MM filters. The additional property of S-operators to preserve the connectivity and shape (edges) of restored image areas is provided. Some examples of S-morphological object extraction and F-morphological feature extraction are outlined. The significant improvement of extraction quality is demonstrated with comparison to usual MM operators. The example of special SM based on contour filtering is described.
The modern passport and visa documents include special machine-readable zones satisfied the ICAO standards. This allows to develop the special passport and visa automatic readers. However, there are some special problems in such OCR systems: low resolution of character images captured by CCD-camera (down to 150 dpi), essential shifts and slopes (up to 10 degrees), rich paper texture under the character symbols, non-homogeneous illumination. This paper presents the structure and some special aspects of OCR system for portable passport and visa reader. In our approach the binarization procedure is performed after the segmentation step, and it is applied to the each character site separately. Character recognition procedure uses the structural information of machine-readable zone. Special algorithms are developed for machine-readable zone extraction and character segmentation.
This paper describes the new image segmentation technique. The suggested algorithm is based on the original histogram- based multi-threshold presegmentation procedure. The advantages of this procedure are the high computational speed and good separability in histogram modes detection. The following segmentation makes it possible to obtain the reasonable 'good-looking' set of noncrossing homogeneous regions of the image that could be found out and measured later. Experiments on the number of real airborne images demonstrate the efficiency of the proposed approach.
The generic technique called the 'Evidences-based Image Analysis' is proposed for a model-based object detection. Real images to be analyzed are considered as the sources of evidences generated by the procedures of low-level image processing. These evidences support or refute hypothesis connected with different objects and their features. The Bayesian theorem is of use for hypothesis testing by evidences. The unknown parameters of probabilistic model are used as the internal parameters of algorithm tuning. This approach provides the most uniform and efficient way for the fusion of any available image information: intensity and contour, 2D and 3D, multispectral, multisensor and so on. Our technique takes into account three principal points: object/background model, registration model and corruption model. This paper concentrates mainly on the registration parameters' estimation, especially on the problem of geometrically invariant object detection. It is shown that the Hough-like accumulation methods really implement the maximum a posteriori estimation of the parameters of registration model under the assumption of statistical independence of evidences. The reduction and separation of models are proved to be the legal ways for fastening of the invariant object detection. The usage of complex hierarchical models of objects is considered as another way for fast invariant detection and recognition.
KEYWORDS: Image processing, Photography, Process modeling, Digital image processing, Software development, Systems modeling, Digital photography, Visualization, Visual process modeling, Data modeling
A digital stereophotogrammetric system based on PC is being developed. The system uses standard IBM PC-AT/386-/486 as a processing unit. The system is capable to fulfill processes of stereo model's building and terrain reconstruction using aero and space photographs. The peculiarity of this system is its possibility to process the large digital images exceeding disk memory capacities. Initial images must be previously decomposed on the fragments by use of the Workstation computer. The fragments are accompanied by some extra information. The survey image is created using pyramid image processing.
The complete object-oriented approach is generalized for the photogrammetry-oriented application development. The advantages of the object-oriented data representation and management are discussed. It is shown that the object-oriented programming is very attractive for data simulation, data unification, and fractal data representation in the digital photogrammetry software design. The original frame paradigm for object-oriented data management is described. The structured processing model is introduced to define the application area of this conception. It is proved that any full photogrammetric processing cycle satisfies this model.
Multisensory remote sensing requires the simultaneous registration and real-time processing of the time-varying multi-sensor image data. The frame-based programming technique was developed to provide the appropriate multi-sensor data management. Any frame-based processing system supports the automatic data updating since the output of any sensor has been changed. The visual programming of data flows is naturally available through the usage of this approach. The appropriate set of the frame types is formed to design the most generic multisensory framework. Finally, the problem of real-time, multi-processor implementation of the frame-based software architecture is addressed.
KEYWORDS: Image processing, Visualization, Human-machine interfaces, Computer programming, Computer architecture, Software development, Osmium, Data processing, Process control, Control systems
The necessity of special and system programming strictly limits the possibilities of image processing specialists developing the program applications if they are not simultaneously the professional programmers. In many respects the efficiency of image processing is determined by the user's interface convenience and saturation and its orientation to the decision of a particular applied problem. There is a rather significant problem of the design of a flexible system that can execute specific image processing scheme in the conveyor mode without the human supervising after the preliminary fitting with the use of visual programming. The important property of these systems must be the possibility of the automatic processing changing under the condition of initial data updating. The object-oriented frame approach to image processing is developed in this paper. The essence of the given approach consists in that the image processing scheme represents a semantic network of software frames, such that each of them is an independent object, and over each of them it's possible to execute the separate transformation. All necessary types of software frames are considered and the interaction of objects in such a network provided by means of the message transmissions between frames in accordance with some logic rules is discussed. The way of image processing systems design offered permits the user to use all the advantages of object-oriented programming and window- based user's interface for the image processing. To illustrate the resources of this approach authors worked out the Visual PISoft or Windows image processing system on PC.
Correlation techniques are widely used to match corresponding areas in stereo image pairs. This technique provides pixel correspondence that is required for the generation of 3D data. However correlation based approaches can provide false results and inaccurate matching due to noise and geometric and radiometric distortions in the stereo images. This paper is devoted to not well known outside Russia the Pytiyev morphological approach. The main idea of this approach is based on terms of the set topology theory and the projection on subspace created by the conditions of image changeability including radiometric distortions. The shape of image acquires quantitative mathematical description based on topological ideas. The specific correlational measure is built. This method can be effectively used in image matching and comparison tasks.
A digital stereophotogrammetric system is being developed to perform photogrammetric tasks with a minimum of cost and hardware operating complexity. The standard IBM PC-AT/486 is used as system processing unit. Two stereo vision modes are available: anagliphic and mirror. With the use of aerosurvey and spaceaerosurvey photographs of different projections, the problems of interior, relative and absolute orientation are solved in a rigorous and efficient way. On the future stage of system development digital information from satellites will be processed. Stereomeasurement operations can be executed both in manual and automatic way. The systems enables processing of digital images exceeding disk memory capacities, hence the images with pixel size of 5 - 10 mkm can be processed. Digital images areas under operation are stored on the disk beforehand. System software enables fast and easy access to any area of any image stored on the disk.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.