The convolutional neuronal network (CNN) performs spatial learning on a two-dimensional data (e.g., images) using filters to learn features from the images. Hence it requires many images that have high discriminant spatial and longitudinal features, within and between classes for comprehensive learning. When this requirement is not met the CNN models suffer from the data paucity problem that leads to limited learning and poor classification performance. The segmentation and detection of birds from RGB videos to study the behavior of backyard birds is one of the applications that suffer from this data paucity problem. This paper first presents a new backyard birds’ dataset that is extracted from RGB videos and consisted of the images of a cardinal and a sparrow to use it for developing an artificial neural network (ANN) model with a frequency-driven feature learning approach. It was observed that the images of these birds and their discriminant textures are geometrically distorted due to rapid movements and postures of these birds. These geometrical distortions bury the true representations of the main and the side lobs of the frequency spectrum of the images of the birds. To extract these latent features at different frequency bands and construct feature vectors for training an ANN model, Kaiser–Bessel window is used in the frequency domain along with the fast Fourier transform. Simulations show that by carefully selecting the model’s parameters of the ANN model and the simulation parameters, we can achieve segmentation and detection of the cardinal and sparrow images with about 98% and 96% training and testing accuracy, respectively.
|