Paper
21 July 2023 Turning a CLIP modal into image-text matching
Shouyong Peng, Yafei Bu, Ze Li, Jintao Wang, Tao Yao
Author Affiliations +
Proceedings Volume 12717, 3rd International Conference on Artificial Intelligence, Automation, and High-Performance Computing (AIAHPC 2023); 127173F (2023) https://doi.org/10.1117/12.2684681
Event: 3rd International Conference on Artificial Intelligence, Automation, and High-Performance Computing (AIAHPC 2023), 2023, Wuhan, China
Abstract
Image-text matching (ITM) benefits from Large-scale Contrastive Language-Image Pre-training (CLIP) method that achieves higher accuracy. However, the CLIP method learns by contrasting global visual and textual features, which inevitably leads to the problem of mismatching in the image-text process due to the lack of inter-modal fine-grained information. Therefore, in this work, we propose a method called Turning a CLIP Model into Image-Text Matching (CIT) that focuses on combining fine-grained information between modalities to convert the CLIP model into a more efficient ITM model. The CIT method effectively improves the image-text matching accuracy of existing CLIP model and does not require additional pre-training. We demonstrate the effectiveness of our method through experiments with a range of state-of-the-art methods on two widely used datasets.
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Shouyong Peng, Yafei Bu, Ze Li, Jintao Wang, and Tao Yao "Turning a CLIP modal into image-text matching", Proc. SPIE 12717, 3rd International Conference on Artificial Intelligence, Automation, and High-Performance Computing (AIAHPC 2023), 127173F (21 July 2023); https://doi.org/10.1117/12.2684681
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Visualization

Visual process modeling

Feature extraction

Semantics

Information visualization

Education and training

Image retrieval

RELATED CONTENT


Back to Top