Multi-modal assisted knowledge distillation for 3D question answering

Bo Xu

doi:10.1117/12.3004292

11 October 2023 Multi-modal assisted knowledge distillation for 3D question answering

Bo Xu

Proceedings Volume 12800, Sixth International Conference on Computer Information Science and Application Technology (CISAT 2023); 128005I (2023) https://doi.org/10.1117/12.3004292
Event: 6th International Conference on Computer Information Science and Application Technology (CISAT 2023), 2023, Hangzhou, China

Abstract

3D question answering (3D-QA) aims to answer free-form nature language questions given 3D scenes represented by point clouds. Compared to traditional 2D-QA, 3D-QA poses a dual challenge for models by assessing their understanding of both object appearance and structure, along with their spatial relationships. In this work, we introduce a novel method, named M2AD, that leverages multi-modal data to enhance the representation of 3D scene point clouds during the training phase. Specifically, we augment the capabilities of the model by incorporating 2D features corresponding to 3D objects and captions corresponding to the scene into the 3D object proposal stage, thereby endowing it with stronger representation abilities. Furthermore, to ensure self-reliance during inference without the need for additional data, we adopt a teacher-student framework to distill the enhanced model's knowledge to a model solely utilizing point cloud data. Extensive experimentation substantiates the effectiveness of our proposed model.

(2023) Published by SPIE. Downloading of the abstract is permitted for personal use only.

Citation Download Citation

Bo Xu "Multi-modal assisted knowledge distillation for 3D question answering", Proc. SPIE 12800, Sixth International Conference on Computer Information Science and Application Technology (CISAT 2023), 128005I (11 October 2023); https://doi.org/10.1117/12.3004292

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $17.00

Non-members: $21.00 ADD TO CART

PROCEEDINGS
7 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

3D modeling

Point clouds

Data modeling

Education and training

Visualization

3D image processing

Semantics

Show All Keywords

Keywords/Phrases

Search In:

Publication Years