Paper
11 October 2023 Multi-modal assisted knowledge distillation for 3D question answering
Bo Xu
Author Affiliations +
Proceedings Volume 12800, Sixth International Conference on Computer Information Science and Application Technology (CISAT 2023); 128005I (2023) https://doi.org/10.1117/12.3004292
Event: 6th International Conference on Computer Information Science and Application Technology (CISAT 2023), 2023, Hangzhou, China
Abstract
3D question answering (3D-QA) aims to answer free-form nature language questions given 3D scenes represented by point clouds. Compared to traditional 2D-QA, 3D-QA poses a dual challenge for models by assessing their understanding of both object appearance and structure, along with their spatial relationships. In this work, we introduce a novel method, named M2AD, that leverages multi-modal data to enhance the representation of 3D scene point clouds during the training phase. Specifically, we augment the capabilities of the model by incorporating 2D features corresponding to 3D objects and captions corresponding to the scene into the 3D object proposal stage, thereby endowing it with stronger representation abilities. Furthermore, to ensure self-reliance during inference without the need for additional data, we adopt a teacher-student framework to distill the enhanced model's knowledge to a model solely utilizing point cloud data. Extensive experimentation substantiates the effectiveness of our proposed model.
(2023) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Bo Xu "Multi-modal assisted knowledge distillation for 3D question answering", Proc. SPIE 12800, Sixth International Conference on Computer Information Science and Application Technology (CISAT 2023), 128005I (11 October 2023); https://doi.org/10.1117/12.3004292
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
3D modeling

Point clouds

Data modeling

Education and training

Visualization

3D image processing

Semantics

Back to Top