Paper
10 October 2023 MHViTPose: multiscale hybrid vision transformer for human pose estimation
Junhui Qu, Ziyan Zhao, Xiang Yu, Wei Zhang
Author Affiliations +
Proceedings Volume 12799, Third International Conference on Advanced Algorithms and Signal Image Processing (AASIP 2023); 127991R (2023) https://doi.org/10.1117/12.3006800
Event: 3rd International Conference on Advanced Algorithms and Signal Image Processing (AASIP 2023), 2023, Kuala Lumpur, Malaysia
Abstract
Despite the significant progress achieved by visual Transformers, there are still some limitations that need to be addressed in human pose estimation. Firstly, Transformer lacks CNN’s inductive bias and local feature attention capabilities, which require extensive training data and iterations to achieve satisfactory results. Therefore, we propose a hybrid network that combines convolutional and Transformer. Besides, to address the recognition of human body images at different scales, we established a Transformer pyramid structure, which achieves recognition of human body images at different scales through progressive reduction of the input resolution. Specifically, our algorithm achieves an accuracy of 77.3% with a computational complexity of 19.6 GFLOPs. Compared to traditional direct regression methods, our algorithm considerably enhances detection accuracy while reducing the training complexity and significantly increasing the detection speed compared to traditional Transformer methods.
(2023) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Junhui Qu, Ziyan Zhao, Xiang Yu, and Wei Zhang "MHViTPose: multiscale hybrid vision transformer for human pose estimation", Proc. SPIE 12799, Third International Conference on Advanced Algorithms and Signal Image Processing (AASIP 2023), 127991R (10 October 2023); https://doi.org/10.1117/12.3006800
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Transformers

Pose estimation

Feature extraction

Visualization

Education and training

Feature fusion

Human vision and color perception

RELATED CONTENT


Back to Top