Paper
20 January 2021 Semantic relation graph reasoning network for visual question answering
Hong Lan, Pufen Zhang
Author Affiliations +
Proceedings Volume 11719, Twelfth International Conference on Signal Processing Systems; 117190J (2021) https://doi.org/10.1117/12.2588837
Event: Twelfth International Conference on Signal Processing Systems, 2020, Shanghai, China
Abstract
In order to answer semantically-complicated questions about an image, a Visual Question Answering (VQA) model needs to fully understand the visual scene in the image, especially the dynamic interaction between different objects. This task inherently requires reasoning the visual relationships among the objects of image. Meanwhile, the visual reasoning process should be guided by the information of the question. In this paper, we proposed a semantic relation graph reasoning network, the process of semantic relation reasoning is guided by the cross-modal attention mechanism. In addition, a Gated Graph Convolutional Network (GGCN) constructed based on cross-modal attention weights that novelly injects the semantic interaction information between objects into their visual features, and the features with relational awareness are produced. In particular, we trained a semantic relationship detector to extract the semantic relationship between objects for constructing the semantic relation graph. Experiments demonstrate that proposed model outperforms most state-of-the-art methods on the VQA v2.0 benchmark datasets.
© (2021) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Hong Lan and Pufen Zhang "Semantic relation graph reasoning network for visual question answering", Proc. SPIE 11719, Twelfth International Conference on Signal Processing Systems, 117190J (20 January 2021); https://doi.org/10.1117/12.2588837
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
Back to Top