Paper
18 March 2022 Multi-modal guided attention for live video comments generation
Yuchen Ren, Yuan Yuan, Lei Chen
Author Affiliations +
Proceedings Volume 12168, International Conference on Computer Graphics, Artificial Intelligence, and Data Processing (ICCAID 2021); 1216819 (2022) https://doi.org/10.1117/12.2631006
Event: International Conference on Computer Graphics, Artificial Intelligence, and Data Processing (ICCAID 2021), 2021, Harbin, China
Abstract
With the blooming of online video applications, live commenting is an emerging feature of online video sites. The live video comments generation (LVCG) task aims to generate live comments for videos while considering both the video and the surrounding comments made by other viewers. In this work, we aim to improve the relevance between live comments and videos by modeling the cross-modal interactions among different modalities. To overcome the problem of insufficient multimodal interactions for live video comments generation, we built two basic attention blocks: the self attention (SA) block that can model the dense intramodal interactions; and the x-guided attention (XGA) block to model the dense intermodal interactions. After that, by modular compositions of the SA and XGA blocks, we propose different multimodal transformer architectures to handle the multimodal features. Finally, experiments show that our proposed multimodal guided attention models significantly outperform previous methods in most of the metrics.
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Yuchen Ren, Yuan Yuan, and Lei Chen "Multi-modal guided attention for live video comments generation", Proc. SPIE 12168, International Conference on Computer Graphics, Artificial Intelligence, and Data Processing (ICCAID 2021), 1216819 (18 March 2022); https://doi.org/10.1117/12.2631006
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Video

Visual process modeling

Computer programming

Transformers

Visualization

Head

Information visualization

RELATED CONTENT

Rotary transformer for image captioning
Proceedings of SPIE (September 09 2022)
A local correlation based visual saliency model
Proceedings of SPIE (September 28 2016)

Back to Top