Poster + Paper
13 June 2023 Real-time crowd counting via mobile-friendly Vision Transformer network
Author Affiliations +
Conference Poster
Abstract
Accurate crowd counting in congested scenes remain challengeable in the trade-off of efficiency and generalization. For solving this issue, we propose a mobile-friendly solution for the network deployment in high response speed demand scenarios. In order to introduce the profound potential of global crowd representations to lightweight counting model, this work suggests a novel crowd counting aimed mobile vision transformers architecture (CCMTNet), which strives for enhancing the efficiency of the model universality in real-time crowd counting tasks on resource constrained computing devices. The framework of linear CNN network interpolation structure with self-attention blocks endows the model with the ability of local feature extraction and global high-dimensional crowd information processing with low computational cost. In addition, several experimental networks with different scales based on the proposed architecture are comprehensively verified to balance the accuracy loss as compressing the computing costs. Extensive experiments on three mainstream datasets for crowd counting tasks well demonstrate the effectiveness of this proposed network. Particularly, CCMTNet achieves the feasibility of reconciling the counting accuracy and efficiency in comparisons with traditional lightweight CNN networks.
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Peirong Ji, Zhiwei Wu, Yan Chen, Mohammad S. Alam, and Jun Sang "Real-time crowd counting via mobile-friendly Vision Transformer network", Proc. SPIE 12527, Pattern Recognition and Tracking XXXIV, 125270U (13 June 2023); https://doi.org/10.1117/12.2663761
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Transformers

Visual process modeling

Feature extraction

Education and training

Performance modeling

Convolutional neural networks

Head

Back to Top