Homogeneous data fusion plays a crucial role in multiview human action recognition (MvHAR). However, most existing methods tend to employ the concatenate strategy, ignoring the underlying correlation among views and degrading the recognition performance. To this end, we propose a practical MvHAR framework based on a deep discriminant analysis network that excavates multiview video features to obtain a more discrimination representation, from which the view correlation can be explored. Specifically, the spatial–temporal features of multiview data are extracted by the convolution network, and then a deep multiview feature fusion network is proposed to project these features into an advanced subspace for efficient fusion. Traditional methods can lead to class overlap problems, but, to avoid this problem, we improve the separation between two classes by using pairwise between-class scatter. Experiments on five benchmark datasets indicate the efficiency of our framework compared with the advanced algorithms under four metrics. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
Action recognition
Video
Feature extraction
Feature fusion
Network security
Convolution
Matrices