Guido Tascini, Primo Zingaretti
Proceedings Volume Visual Communications and Image Processing '94, (1994) https://doi.org/10.1117/12.185964
Image sequence recognition is an interesting problem involved in various situations of the Computer Vision field and in particular in mobile robot vision. Typical for this purpose is the motion estimation from a series of frames. Many techniques of motion estimation are described in literature 10, 13, 22 The approaches are normally divided in two categories: pixel based methods and feature based methods. In both the motion is estimated in two steps: 1) 2D motion analysis (feature based) or estimation (pixel based), 2) 3D motion estimation. The pixel based, or flow based, method uses local changes in light intensity to compute optical flow at each image point and then derives 3D motion parameters . The feature based method, on which it falls our choice, firstly extracts the features (as corners, point of curvature, lines, etc.). They are used as features: sharp changes in curvature 15, global properties of moving objects 18, lines and curves 16, centroids 6 Secondly it establishes the correspondences of these features between two successive frames (correspondence problem), and finally it computes motion parameters and object structure from correspondences (structure from motion problem). The motion correspondence is the most difficult problem. Occlusion masks the features and noise creates difficulties. Given n frames taken at different time instants and m points in each frame, the motion correspondence maps a point in one frame to another point in the next frame such that no two points map on the same point. The combinatorial explosiveness of the problem has to be constrained; in Rangarajan and Sah 19 it is proposed the proximal uniformity constraint: given a location of a point in a frame, its location in the next frame lies in the proximity of its previous location. Even tough the problem has not yet been solved, many solutions are proposed for 3D motion estimation, assuming that the correspondences has been established 12, 13, 24 Regularization theory has also been proposed for the numerical improvement of the solution of both feature based and pixel based problems . From the human stand point a vision system may be viewed as performing the following tasks in sequence: detection, tracking and recognition. The detection rises at cortex level; then it follows the tracking of objects contemporary attempting to recognize them. From the machine stand point the movement detection phase may be viewed as a useful mean to focus the system attention so reducing the search space of the recognition algorithms. Particular attention has to be reserved in detecting moving objects in presence of moving background, from monocular image sequence. Several researchers have faced the problem 14, 21 When we take the images from a moving vehicle (for instance with translational movement) it is necessary to distinguish between real and apparent movement. The stationary objects of the scene appear to move along paths radiating from the point toward which we are moving (focus of expansion). By operating a transformation on the image, called Complex Logarithmic Mapping (see Frazier-Nevatia 8), it is possible to convert the problem from one of detecting motion along both the X and Y axes to one of detecting motion from along an angular axis. After executing an horizontal edge detection, if we observe the motion of edges in the vertical direction we can conclude that there is a moving object in the scene. Our approach is feature based and a series of considerations are necessary to understand the solution adopted. We regard as features edges, corners or whole regions. The choice of a feature depends on the facility of retrieving it in the successive frames, forming a correspondence chain. The corner detection may be based on revealing the sharply direction change of intensity gradient. In Rangarajan et al. 20 it is described the construction of a set of operators to detect corners. Being the corners the mainly used features particular attention has been devoted to correspondences among points. For these they may be adopted two approaches: 1) with matching, in which two point patterns, from two consecutive images, are matched (elastic matching 25); 2) without matching, by using the criteria of proximity and regularity of point trajectories. Our approach uses two types of matching: geometric and relational. The geometric matching uses parametrized geometric models and may be viewed as a parametrized optimization problem. The relational matching uses relational representations and may be viewed as the problem of detecting the isomorphism among graphs.